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PREFACE 


This programmed introduction to statistics presents instruction in a 
logical sequence which allows the student to participate actively in the 
instruction process. It is written for students being introduced for the 
first time to statistical techniques and to the application of those tech¬ 
niques, It may be used for individual study or in undergraduate or grad¬ 
uate courses. It may be used as the only statistical text for a course or 
as an auxiliary text. 

The focus is primarily upon the student who is unfamiliar either with 
the basic concepts of statistical techniques or with the mathematics 
needed to apply these techniques. Only a rudimentary knowledge of 
algebra is needed. This program is a beginning course, and stresses 
application; it does not attempt to develop theoretical or mathematical 
derivations of the various techniques. 

Statistics is a difficult subject for many students. The major reasons 
for this may be that too much instruction is given at one time, that the 
material is not logically organized so that the student can follow its de¬ 
velopment, or that the student is not actively engaged in the instructional 
procedure. 
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Programmed instruction attempts to solve these problems by allowing 
the student to participate actively in every portion of the instructional 
process, by presenting instruction organized so that each step leads in 
logical sequence to the next, and by allowing the student to proceed at 
his own pace through the program—quickly through those areas that 
present no difficulty and more slowly where he feels it necessary. An 
additional advantage in programmed instruction is that after each finite 
step in the program, the student is informed immediately of the accuracy 
of his understanding. This immediate feedback is one of the important 
features of programmed instruction. It provides correction when nec¬ 
essary; more important, it verifies when the material has been correctly 
grasped, thus reinforcing learning. 

This material has been field tested thoroughly. It has undergone three 
revisions and has been used by hundreds of students in several types of 
courses involving the study of statistics. The rate of student error in 
frame responses is well under five percent, and before-and-after testing 
has shown gratifying increases in student mastery of statistics. 

This text is logically organized into twenty-four sets. Each set is 
self-contained and can usually be completed at one sitting. A brief set 
introduction describes the contents of each set and presents specific 
objectives for that set. These stated objectives alert the student to the 
important aspects of statistics to be covered in that set. 

Each set contains a series of frames written and sequenced so that 
the required responses can be determined easily. The correct response 
is given immediately below each frame. It is this process of presenting 
the material in sequenced frames, each requiring a positive response, 
then immediately providing a check for the accuracy of the response, 
that constitutes the concept of "programmed" instruction. 

At the end of each set is a series of exercises. They are an integral 
part of the instruction, providing a self-test and giving the student op¬ 
portunity to apply what he has learned. The exercises parallel exactly 
the objectives set out in the introduction to each set. 

Another unique feature of this book is the presentation of formulas, 
tables, and a glossary of symbols at the rear of the book. These can be 
removed for convenient reference while the student is working in this 
text and kept for permanent reference later. 

All data presented in this text are fictitious and were developed specifi¬ 
cally to illustrate this program. In order to reduce computational 
drudgery, the amount of data presented is kept to the minimum necessary 
to illustrate the statistical techniques. 

Conventional symbolic notation has been used throughout the text so 
that it may be used with standard statistical texts without confusion. 
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INTRODUCTION 


In this program you will learn a number of statistical techniques, 
their statistical formulas, and when and how to use them. In order to 
use this program, you need only a rudimentary knowledge of algebraic 
procedures (that is, the use of symbols and the solving of simple alge¬ 
braic equations). Even if you are weak in these areas, you should be 
able to use this program, since much help in computation is given, 
especially in the early sets. 

WHY THIS TEXT IS IN PROGRAMMED FORM 

This text presents a new method for learning statistics in which you: 

(1) actively participate in every portion of the instructional process; 

(2) are presented with step-by-step instruction organized so that each 
step leads logically to the next; and (3) are allowed to proceed at your 
own pace through the program, moving quickly through those areas that 
present no difficulty and more slowly where you feel it necessary. After 
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each finite step in the program, you are immediately informed of the 
correctness of your understanding. This immediate feedback is one of 
the important features of programmed instruction. It provides correc¬ 
tion when necessary; but more important, it verifies when you have 
correctly grasped what is being taught. 

HOW TO USE THIS PROGRAMMED TEXT 

This program is divided into twenty-four sets. Each set contains 
three components: (1) an introduction to the set, (2) the programmed 
instruction portion of the set, and (3) a series of exercises. 

(1) The Introduction 

Before each set in the program a brief introduction statement de¬ 
scribes the content of the programmed portion of the set and provides 
additional information not contained in the programmed portion. You 
should read the introduction to each set carefully since it gives you a 
frame of reference for the instruction that follows. The introduction also 
states the specific objectives to be achieved in the set. These objectives 
alert you to the important aspects of statistics to be presented in the set 
and provide a clear picture of what you can expect to learn from the set. 

(2) The Program 

Each set of the program contains a number of small units, called 
frames. Each frame presents some information and includes a blank 
space which you are to fill in. The correct response to each frame, 
which is given immediately below it, should be kept covered with a card 
or sheet of paper until after you have written your response to the frame. 

You are given some cues as to the type of response required in each 
frame. For instance, the number and size of the blanks indicate the 
number and size of words required. In some frames there is a series 
of alternatives from which you are to choose. For example, "Grass is 

_ (green/red/blue)." Some frames require you to provide a 

symbol. This is indicated by the word (symbol) following the blank. For 
example, "When you wish to express dollars you use_(symbol)." 

(3) The Exercises 

At the conclusion of each set there is a series of exercises. The 
exercises in this book are an integral part of the instructional process 
and all are to be completed. They provide a self-test by which you may 
determine whether you have grasped the material in the set. You will 
notice that the exercises exactly parallel the objectives as stated in the 
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introduction to each set. The additional experience of performing these 
exercises increases and reinforces your understanding of what you have 
just learned. The answers are provided in the rear of the text. 

Formulas, Tables,and a Glossary of Statistical Symbols are pre¬ 
sented at the rear of the text. These will frequently be referred to as 
you proceed through the frames of this program. They are perforated 
and should be removed and examined when referred to in the text. The 
Formulas, Tables, and Glossary provide you with a permanent reference 
for further statistical work. 

The data presented in this program are fictitious and are kept to a 
minimum to reduce the amount of computation required. It should be 
pointed out that the usual research study includes much more data than 
is given in the illustrations. 

A WORD OF CAUTION 

For this programmed text to be most effective, you must follow the 
above instructions explicitly. Any short-cut reduces your chance of 
learning and retaining the statistical concepts and procedures covered. 
You should: 

(1) Read the introduction and objectives for each set. 

(2) Read each frame carefully, writing in your response to each 
blank provided in the frame. Keep the correct response below 
the frame COVERED until you have written in your response. 

(3) After responding, check your response with the correct response 
below the frame. If your response is correct, proceed to the 
next frame. If your answer is incorrect, you should review the 
preceding frames in order to find out why you were incorrect. 
Make the necessary correction in your response before 
proceeding. 

(4) Do all the exercises at the end of each set. Check your aiiswers 
with those provided at the rear of the text. 

Follow the above instructions and you will be assured of getting the 
most out of this programmed text in beginning statistics. 
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ORGANIZATION OF DATA: frequency distributions 


Statistics is concerned with the analysis of observed facts which have 
been expressed as numbers. These numbers may be test scores, linear 
measurements, frequencies of occurrence, numbers of people, etc. When 
a researcher obtains a quantity of these numbers in order to describe or 
make an inference about a phenomenon, they are usually called data. 

How the researcher obtains these data falls in the realm of research 
design; how he examines these data falls in the realm of statistics. In 
this book we shall be concerned only with the analysis of statistical data, 
not in how they were obtained. 

We will be concerned initially with procedures for describing data. 
Later we will examine some statistical techniques to let us draw in¬ 
ferences from these data. 

The first step in the statistical analysis of a mass of numerical data 
is to arrange the data in some order. This set presents definitions of 
the terms data ^ raw score ^ variability, and frequency ^ and illustrates 
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the preparation of frequency distributions of both ungrouped and grouped 
data, and the determination of the upper and lower limits for a class 
interval. 

SPECIFIC OBJECTIVES OF SET 1 

At the conclusion of this set you will be able to: 

(1) prepare a frequency distribution of ungrouped data. 

(2) group data into class intervals, 

(3) prepare a frequency distribution of grouped data, 

(4) identify the real lower and upper limits of any particular raw 
score. 

(5) identify the symbols X,/, N, i, U. 
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1 , In education and psychology, measurements of certain characteristics 

are obtained for groups of individuals. When you_the 

intelligence, height, or weight of a group of individuals, you obtain 
a set of numbers. 


measure 


2 . The technical term for a set of numbers is data, A set of numbers 
indicating the individual arithmetic scores for a group of children 
is called . !' 


data 


3. The term raw score is used to indicate the measurement, or datum, 
obtained for one individual. Thus, the number of items a person 
passes on an intelligence test is termed a___. 


raw score 


4. If John Smith receives 95 points in an arithmetic test, he is said to 
have a__ of 95 in arithmetic. 


raw score 


6. Because people vary in their ability to do arithmetic, it is likely that 
the raw scores of the group also_. 


vary 
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6. When you collect data on a number of individuals, you find that the 

for all individuals are not the same. 


raw scores 


Plate 1 

Spelling Scores 

10 

12 

9 

12 

7 

8 

12 

7, This plate gives the spelling scores that seven people received on a 
spelling test. These spelling scores may be called_ 


raw scores 


8, PLATE 1. To describe this set of raw scores best, you first need 
to order them. To do this, you arrange the raw scores in numerical 
order with the largest score at the top. Order the raw scores in 
Plate 1. 


Plate 2 

Spelling Scores 

12 

12 

12 

10 

9 

8 

7 


8 









9. PLATE 2. The highest score value is 12 and the lowest score value 
is 


7 


10, PLATE 2, The largest score value is 12 and the smallest score 
value is 7. Therefore, a raw score for any particular individual 
in this group must be somewhere between_and_. 


7, 12 


11 . 


If the largest score value is 12 and the smallest score value is 7, 
it is possible for an individual to receive any one of six different 

score values. The possible score values are_,_,_,_, 

_, and_. 


12, 11, 10, 9, 8, 7 


12. The number of times a raw score occurs in a set of data is called 
the frequency of that score. In Plate 2, the frequency of score value 
12 is 3, because this raw score occurs_times. 


three 








13. When you list the frequency with which each score value occurs in a 
set of raw scores, you have a frequency distribution of the_ 


raw scores 

Plate 3 

Frequency Distribution of 
Intelligence Test Scores 

X f 

105 
104 
103 
102 
101 
100 
99 
98 

14, This plate gives the_ 

for a set of raw scores. 


frequency distribution 

16, PLATE 3. Each possible score value is identified in the X column, 

and the frequency of each score value is listed in the_(symbol) 

column. 


_ f _ 

16, PLATE 3. The symbol used for the score value is_(symbol). 


X 

17, PLATE 3. The frequency for score value 99 is 

2 
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18. PLATE 3. From this plate you learn that the symbol/ represents the 
_of each score value in the frequency distribution. 


frequency 


19. PLATE 3. For the score value of 100, the/is zero because 

subjects have the raw score of 100, 


no 


20. PLATE 3, From this plate you learn that N is used to represent the 

total number of 


in the frequency distribution. 

raw scores 

21. Prepare the frequency distribution for the raw scores presented in 

Plate 2. (Use Plate 3 as a guide.) 


Plate 4 



Frequency Distribution of 
Spelling Scores 

X 

/ 


12 

3 


11 

0 


10 

1 


9 

1 


8 

1 


7 

1 


AT = _ 

— 

22. PLATE 4, JV =_. 

7 

— 
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23. PLATE 4. N = 7 because 7 is the total 
for which there are spelling scores. 


number 


of individuals 


24. PLATE 4. For the score value of 11, the / is zero because there 
are_individuals having the raw score of 11, 


no 


2S. PLATE 4. For the score value of 12, / = 3 because there are 
_individuals having the raw score of 12. 


three 


26. In a frequency distribution, each score value is listed in the_ 

(symbol) column and the frequency for each score is listed in the 
_(symbol) column. 


/ 
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27, When the frequency distribution presents the / for each separate 
score value, the distribution is for ungrouped data. 

Plate 5 

Frequency Distribution for Ungrouped Data 


X 

/ 

X 

/ 

X 

/ 

30 

1 

23 

0 

16 

1 

29 

1 

22 

2 

15 

1 

28 

0 

21 

2 

14 

0 

27 

1 

20 

1 

13 

0 

26 

2 

19 

3 

12 

0 

25 

1 

18 

0 

11 

1 

24 

0 

17 

1 

10 

1 


iV = 19 


This plate gives the frequency distribution for__data. 


ungrouped 


28, PLATE 5. This plate gives the frequency distribution for ungrouped 
data because the / for each_value is listed separately. 


score 


29. Frequencies are sometimes presented for groups of score values. 
A distribution in which the score values are grouped is called a fre¬ 
quency distribution for_data. 


grouped 


30, When a large range of score values is involved in a frequency dis¬ 
tribution, it becomes desirable to_the score values into 

what are called class intervals. 


group 
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31, PLATE 5. These data can be_intocZass intervals 

encompassing, for example, three score values each. 


grouped 


Plate 6 



Frequency Distribution for 
Grouped Data 

i 

/ 


28-30 

2 


25-27 

4 


22-24 

2 


19-21 

6 


16-18 

2 


13-15 

1 


10-12 

2 



AT = 19 

32, This plate presents 

a frequency distribution for 

data. This distribution is for the same data that were presented in 
Plate 5 as a distribution for ungrouped data. 


grouped 


83. PLATE 5. 

The frequency of the three score values of 10, 11, and 

12 are: 



X 

/ 


12 

0 

The sum of the/*s for these 

11 

1 

three score values is 

10 

1 

2 


14 









34. PLATE 5. When the/’s of these three score values are grouped 
into the class interval {i) 10-12, the/for this interval is_. 

X f i f 

12 0 

11 1 10-12 2 

10 1 


2 


36. The symbol for class interval is i. The i equals the number of score 
values represented in the_ 


class intervals 


36, When thefs for the three score values of 10, 11, and 12 are grouped 
for the class interval 10-12, i = _. 


3 


37. For the class interval 11-15, i = 


5 


38. For class interval 11-15, i = 5 because there are __score 

values encompassed in the interval. 


‘five 


39. The five score values encompassed in the class interval 11-15 are 
_»_,_,_, and_. 


11, 12, 13, 14, 15 


15 









40. Determine the/for the class interval 12-14. 


Ungrouped data 

X f 

14 3 

13 2 

12 3 

Grouped data 

i f 

12-14 _ 

8 

41. PLATE 5. Group the data by class intervals {i) containing three 

score values each, beginning with i 10-12, and prepare the frequency 
distribution. 

Plate 6 

Frequency Distribution for 

Grouped Data 

i 

f 

28-30 

2 

25-27 

4 

22-24 

2 

19-21 

6 

16-18 

2 

13-15 

1 

10-12 

2 

AT = 19 


42. Frequency distributions for grouped data are usually presented when 


there are 

(few/many) different score values in the 

distribution. 

many 


16 








43. NOTE: For simplicity of presentation, the examples here are for 
small groups with few score values involved. Typically, you do not 
group the data when you have less than fifteen or twenty different 
score values in the distribution, 

go on to next frame 


44. The real limits of each score value lie .5 score points above and ,5 
score points below the score value. Thus, for score value 10, the 
real upper limit is 10,5 and the real lower limit is_, 


9.5 


46. 11.5 and 12.5 are the real limits of the score value 

12 


46, The symbol for the lower limit of a score value is U., The M for 
the score value 16 is 


15.5 


47. 9.5 is the_(symbol) of the score value 10. 


k 


48. For frequency distributions of grouped data, the real lower limit 
{M) of a class interval is - score points below the lowest score 
value encompassed in the interval. 


.5 


17 











49 . In addition to representing the lower limits of a single score value, 
the symbol U also represents the lower limits of a class interval. 
7,5 is the_(symbol) of the class interval {i) 8-10. 


60 , The real upper limit of i 8-10 is 

10.5 


61 , PLATE 6. The M of the top-most i is 

27.5 


62 , The real lower and upper limits of i 25-27 are_and 


24.5, 27.5 


63 , PLATE 6. The real lower and upper limits of the entire frequency 
distribution is and 


9.5, 30.5 
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54. PLATE 6. In this plate, i is equal to three score points because 
each class interval encompasses three score values. 


Plate 7 


^ / 

26-30 3 

21-25 4 

16-20 6 

11’15 3 

6-10 3 

1-5 2 


N'= 21 
i - 


5 


55. PLATE 7. The i = 5 because there are_score values en¬ 

compassed in each i. 


five 


56, The value of i for a frequency distribution containing the i 50-59 
is 


10 


19 









EXERCISES 


1. Prepare an ungrouped frequency distribution for the following set 
of raw scores. What is the N for this distribution? 


/ 


10 6 

11 11 

8 9 

5 2 


15 8 

12 9 

7 1 

5 


2. Prepare a frequency distribution for the above set of raw scores, 
grouping them into class intervals with i = 3. 


3. Prepare a frequency distribution for the following set of raw scores. 
Let 2=4, What is the N of this distribution? 


31 

33 

23 

29 

44 

30 

35 

33 

36 

40 

21 

34 

38 

37 

35 

38 

41 



4, Determine the real lower limits for the following raw scores. 

9 107 1 36 

5. Determine the real lower limits for the following class intervals. 

11-15 1-3 4-9 


6. WThat do the following symbols represent? 

X i 

f M 

N 
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set 2 

MEASURES OF 

CENTRAL TENDENCY: mode and median 


Data that have been collected and organized into a frequency distribution 
are ready for statistical analysis. The first step in the analysis of these 
data is to determine the one score value that best characterizes the entire 
frequency distribution. In statistics this value is called a measure of 
central tendency because it is the central value around which the scores 
tend to cluster. In this set two measures of central tendency are pre¬ 
sented: the mode and the median, and methods for determining these 
figures for ungrouped and for grouped data. 

SPECIFIC OBJECTIVES OF SET 2 

At th^ conclusion of this set you will be able to: 

(1) define the term mode, 

(2) define the termmedian, 

(3) locate the mode of a frequency distribution of ungrouped data. 
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(4) locate the mode of a frequency distribution of grouped data. 

(5) calculate the median for grouped data using Formula 1. 

(6) calculate the median for ungrouped data using Formula 1. 

(7) identify the symbols Mdn^ 2,2/^. 



1, When you have a large number of raw scores in a frequency distribu¬ 
tion, they tend to group at some central or representative score. 
This central score is called a measure of central tendency. One 
measure of_is called the mode. 


central tendency 


2, The mode is defined as the most recurring or most frequent score 
in a frequency distribution. 

Plate 8 


X 

12 

11 

10 

9 

8 

7 



The modal score or mode for this frequency distribution is 
score 


12 


3, PLATE 8. The score value 12 is the mode of the distribution because 
it is the most.. . score in the distribution. 


frequent 


4, PLATE 3, page 10. The most recurring or most frequent score 

for this frequency distribution is score_. This score is the 

_of the distribution. 


102, mode 


23 








6 . Since it is a measure of central tendency, the mode tells you nothing 
about the range of raw scores or their variability in the distribution. 
It only tells you which raw score occurs most_. 


frequently 


6 . For grouped data, the i that contains the largest / is called the mode 
of that distribution. 

PLATE 6, page 14 . The interval - is the mode of the 
distribution. 


19-21 


7, The interval 19-21 is the mode of the distribution because it is the 
interval that contains the_/ of scores. 


largest 


8, PLATE 7, page 19. The mode of the distribution is i 


16-20 


9. Another measure of central tendency is the median. It is defined as 
the midpoint,, or center, of the distribution. Thus, it is the point 
above which and below which_ % of the raw scores lie. 


50 


10, The median differs from the mode in that the median is the 

_of the distribution, where the mode is the score 

value or class interval that has the largest/. 


midpoint 


24 









11, The median of a distribution in which there are seven scores will 

be the fourth score because _ scores lie above and 

_scores He below this score. 


three, three 


12, PLATE 8. The symbol Mrfw stands for the median. The score 10 
is the_(symbol) of the distribution. 


Mdn 

18, PLATE 8. Score 10 is the Mdn because it is the__score 

in the frequency distribution. 


middle 

14, PLATE 3, page 10. The score 102 is both the__ 

_of the frequency distribution. 


mode, median 




Plate 9 


X 

90 

89 

88 

87 

86 

85 


/ 

3 

3 
1 

4 
1 

N = 13 


15. The mode of the frequency distribution is __. The Mdn is 


87, 88 


25 













16. 


PLATE 9. Score 87 is the_of the distribution because it is 

the most score. 


mode, recurring 


17. PLATE 9. Score 88 is the_(symbol) of the distribution be¬ 

cause six raw scores are above and six raw scores are below it. 


Mdn 


18, Formula 1 presents the formula for the Mdn and the definitions of 
the new symbols. One of the new symbols introduced is the capital 
Greek letter X, which is pronounced ^’sigma" and means_ 


the sum of 


19, FORMULA 1. The symbol M means the_of 

the class interval which contains the Mdn, 


lower limit 


20. FORMULA 1. The symbol 2/^ means the_of the frequencies 

below the class interval which contains the Mdn. 


sum 


21. FORMULA 1. The symbol/^ equals the_within 

the class interval which contains the Mdn. 


frequency 


26 











22. FORMULA 1. The symbol i represents the_ of score 

values encompassed by each class interval in the distribution. 


number 


23. PLATE 6, page 14 . The i = _because there are three score values 

encompassed by each class interval. 

3 

24. FORMULA 1 . .5iV means that you multiply the value of N by_. 

.5 

25. ,5 is the decimal form of_%. 

50 

26. .5i\r is used in the formida for the Mdn because you wish to determine 

the score above which and below which_% of the raw scores lie. 

50 

27. Because the Mdn is the score above and below which 50% of the raw 

scores lie, the i which contains the Mdn is the i which contains the 
score falling at the_of the frequency distribution. 

center 

28. PLATE 6, page 14 . The i which contains the Mdn is i 

because the midpoint of the frequency distribution is contained in 
it. 

19-21 


27 













29 . PLATE 6, page 14. The M of the i containing the Mdn is 


18.5 

30, PLATE 6, page 14. 18.5 is the M to be used in Formula 1, because 
it is the lower limit of the interval , which is the ^ con¬ 

taining the Mdn. 


19-21 

31, PLATE 6, page 14. 2/^ =_because it is the sum of the frequen¬ 

cies in the intervals below the interval which contains the Mdn. 


5 

32. PLATE 6, page 14. 2/^=5 because it is the_of the/^s which 

appear in the three class intervals below the interval which contains 
the Mdn. 


sum 

33, PLATE 6, page 14. _because it is the frequency within the 

interval 19-21 which contains the Mdn, 


6 

34. PLATE 6, page 14. In order to use Formula 1 to determine the 

Mdn, first determine the numerical values of: M = _, 

^ , and 2/^ =_. 


18.5, 19, 5 
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35, PLATE 6, page 14. M = 18.5, N = 19, 2/^ =5. Also determine 
the numerical values of - _, and i - _. 


6, 3 


36. PLATE 6, page 14. U = 18,5, N = 19, 2/^ = 5, = 6, z = 3. 

Substitute these numerical values for the symbols in Formula 1. 

Mdn = 4- ^^ 


Plate 10 

Mdn = 18,5 + ( ~ ^ j 3 


37. PLATE 10. Do the necessary arithmetic to determine the value of 
the Mdn, 


Mdn = 18.5 + ( 3 ^ 

20.75 


38, PLATE 6. TheMd'n is 20.75 because it is that point in the frequency 
distribution above which and below which_% of the raw scores lie. 


50 


39. PLATE 7, page 19. The i which contains the Mdn is _ 

because the midpoint of the frequency distribution is contained in it. 


16-20 
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40, PLATE 7. The JiZ of the i containing the Mdn is 


15.5 


41, PLATE 7. 15.5 is the ili to be used in Formula 1 because it is the 
_(symbol) of the i containing the Mdn . 


42. PLATE 7. - _because it is the sum of the/*s below the 

interval 16-20. 


8 

43, PLATE 7. =_because it is the/of the i 16-20 which contains 

theMdw. 


6 


44, PLATE 7. S/, =8, /^ = 6. Also determine the numerical values: 
U =_ ,iV=_ J 


15.5, 21, 5 

46. PLATE 7. U = 15.5, N = 21, 2/^ = 8, /^ = 6, ^ = 5. 

Substitute these numerical values for the symbols in Formula 1. 


Plate 11 

- 15.5 . - » ) 5 
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46. PLATE 11. Do the necessary arithmetic to determine the value of 
the Mdn of Plate 7. Mdn = 


17.58 


47. PLATE 7. 17.58 is the Mdn because it is that point in the frequency 

distribution above which and below which_% of the raw scores 

lie. 


50 


48, The computation of the medians of the distributions in Plates 6 and 7 
have been for_(grouped/ungrouped) data. 


grouped 


49, Formula 1 can also be used for computing the Mdn for ungrouped 
data. Since, for ungrouped data, frequencies are presentedfor 
each score value, the i in Formula 1 for ungrouped data is equal 
to 


1 


60, PLATE 9. In order to compute the Mdn, determine: 


87.5, 13, 6 


61. PLATE 9. ig=87.5, iST = 13, 2/^=6. 
Determine: =_, i =_. 


1 , 1 
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62. PLATE 9. ^ = 87.5, N = 13, 2/^ = 6, =1, i = 1. 

Using Formula 1, determine the Mdn of the frequency distribution. 
Mdn = 


88 


Plate 12 

^ / 

7 1 

6 1 

5 3 

4 7 

3 4 

2 2 

1 2 


53. Determine the Mdn using Formula 1. Mdn = 



3.79 


Plate 13 

. 


i 

f 


26-30 

10 


21-25 

15 


16-20 

20 


11-15 

16 


6-10 

14 


1-5 

12 

64. 

Determine the Mdn using Formula 1. 

15.88 

Mdn = 

55. 

The two measures of central tendency discussed thus far are the 


and the 



mode, median 
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56. The median differs from the mode in that the median is the 

_of the distribution, where the mode is the score 

value or class interval that has the largest_ . 


midpoint, frequency 
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EXERCISES 


1. What is meant by the term mode of a frequency distribution? 

2. What is meant by the term median of a frequency distribution? What 
is its symbol? 

3. Locate the mode in the following frequency distributions. Locate 
the median^ using Formula 1. 


X 

/ 

(b) X 

/ 

(C) i 

/ 

(d) i 

/ 

10 

2 

109 

6 

91-94 

4 

40-49 

1 

9 

3 

108 

5 

87-90 

8 

30-39 

6 

8 

2 

107 

2 

83-86 

9 

20-29 

5 

7 

2 

106 

3 

79-82 

7 

10-19 

7 

6 

2 

105 

2 

75-78 

2 

0-9 

2 

5 

1 

104 

2 






4. What do the following symbols represent? 



MEASURES OF 
CENTRAL tendency: nean 


We shall now consider the third measure of central tendency—the 
mean. Unlike the mode and median, the mean lends itself to further and 
more sophisticated statistical analysis and is therefore the most useful 
of the three measures of central tendency. This set defines the mean 
and presents the formula for calculating it from both grouped and un¬ 
grouped data. This set will also describe a method for simplifying the 
computation of the mean by reducing the value of each of the raw scores. 

SPECIFIC OBJECTIVES OF SET 3 

At the conclusion of this set you will be able to: 

(1) define the term mean, 

(2) calculate the mean for ungrouped data, using Formula 2. 

(3) determine the midpoint of a class interval. 

(4) calculate the mean for grouped data, using Formula 2, 



(5) calculate the mean for ungrouped data after reducing each raw 
score by a constant amount to simplify the calculation. 

(6) calculate the mean for grouped data after reducing the midpoint 
of the class intervals by a constant amount to simplify the 
calculation. 

(7) identify the symbol ju. 
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1, The third measure of central tendency, the mean^ is the arithmetic 
average of the raw scores. Thus, the average IQ score of a group 
of subjects is called the_of the group. 


mean 


2. To obtain the mean, or arithmetic average, you sum all of the raw 
scores and divide by the_of raw scores involved. 


number 


3. The symbol that represents the mean is The_(symbol) repre¬ 

sents the arithmetic average of the raw scores. 




4, To obtain the mean (/i), first you must sum all of the 
in the distribution. 


raw scores 


6, Since 2 is the symbol for the sum of and X represents the value of 
the raw score, the symbol for the sum of the raw scores is written 


ZX 


6, To obtain the M you must divide the sum of the raw scores by the 
number of scores in the frequency distribution. Complete the for¬ 
mula for the mean by supplying the missing symbol. 




ZfX 


N 


L.. 
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7, Formula 2 presents the formula for the computation of the mean. 
The symbol 2/X indicates that you multiply each score value by its 
/, and then_all of the products. 


sum 


8 , PLATE 4, page 11. There is an / of 3 for the score value 12. 
Therefore, in summing the raw scores, the value of 12 must be 
included times. 


three 


9, A simple method of determining the sum of the raw scores in a 
frequency distribution is to multiply each X by its / and then sum 
these products. This operation is represented in Formula 2 by the 
symbol_. 


XfX 


10. According to Formula 2, when you sum the raw scores and then 
divide by N, you obtain the_of the frequency distribution. 


mean 


11, PLATE 8, page 23. To use Formula 2 in determining the M of the 
frequency distribution, you must first determine/X for each score 

value. To do this, you must multiply each score value by its_ 

(symbol). 


/ 


12, PLATE 8, page 23. For the score value of 12, the/ = 3; therefore, 
to determine/X for this score value, you must multiply_by_. 


12, 3 
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13, PLATE 8, page 23. ThefX for the score value of 12 is_ 

36 

14, PLATE 8, page 23. The fX for the score value of 12 is 36 because 

the / of this score value is_. 

3 

15, PLATE 8, page 23. Determine thefX for score value 11. 

zero 

16, PLATE 8, page 23. ThefX for score value 11 is zero because the 

/ for this score value is_. 

zero 

17, PLATE 8, page 23. Determine thefX of score value 10. 

10 

18, PLATE 8, page 23. ThefX for score value 10 is 10 because the / 

of this score value is_. 

1 

19, PLATE 8, page 23, Determine the value offX for each of the score 
values of 9, 8, and 7, 

9, 8, and 7, respectively 
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20, PLATE 8, page 23. The/X for each of the score values of 9, 8, 
and 7 is 9, 8, and 7 because the/for each of these score values is 


1 


Plate 14 


X 

/ 

fX 

12 

3 

36 

11 

0 

0 

10 

1 

10 

9 

1 

9 

8 

1 

8 

7 

1 

7 


II 

S/X = 


21. This plate presents the frequency distribution of Plate 8 with an 
additional column giving the fX for each score value. Because 2 

means ”the sum of," the XfX for this distribution is the_of 

all the/X values in the distribution. 


sum 


22. PLATE 14. The 2/X for the frequency distribution is 


70 


23. PLATE 14. The 11 fX is 70 because it is the sum of each X times its 
_(symbol). 


/ 


24. Formula 2 indicates that to obtain the mean {fi) it is necessary that 
HfX be divided by_(symbol). 


N 
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26. PLATE 14. 2/X = 70, N = 7. 

Using Formula 2, substitute these numerical values for the symbols 
and calculate the mean. 




10 


26. PLATE 12, page 32. To determine the M of the distribution, first 
determine/jv for each score value. 


7, 6, 15, 28, 12, 4, 2 


27. PLATE 12, page 32. ThefX for each of the score values is 7, 6, 

15, 28, 4, and 2. The 2/X =_because it is the sum of the/X for 

each score value. 



28. PLATE 12, page 32. X = 


20 


29. PLATE 12, page 32. S/X = 74, X = 20. 

Using Formula 2, determine the M of the distribution. 

M =_ 


3.7 
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Plate 15 


X 

25 

24 

23 

22 

21 


/ fx 

2 

1 

3 
1 

4 _ 

iV= 11 2/X = _ 


30. To determine the ju of the distribution, first determine/X for each 
score value. 


50, 24, 69, 22, 84 


31. PLATE 15. The/X for each of the score values is 50, 24, 69, 22, 84. 

The 2/X is_because it is the sum of the fX for each score 

value. 


249 


32. PLATE 15. X = ll, 2:/X = 249. 

Use Formula 2 to determine the mean of the distribution, m = 


22.6 


33. To determine the more simply, subtract a constant amount from 
each score value before computing E/X, then add it to the obtained 
mean. 

PLATE 15. If you subtract 20 from each of the five score values, 
you obtain the five score values of_, , , , and_. 


5, 4, 3, 2, 1 
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Plate 16 

Data of Plate 15 with Each X Reduced by Twenty Points 

(X-20) / fX 

2 
1 

3 
1 

4 _ 

N= 11 XfX = _ 

34, This plate presents the data of Plate 15 with a constant of 20 sub¬ 
tracted from each score value. Compute fX for each score value in 
Plate 16 and then determine the 2/X for this distribution. 


10, 4, 9, 2, 4; T.fX = 29 


5 

4 

3 

2 

1 


36. PLATE 16. iST = 11, 2/X = 29, 

To determine the of a frequency distribution in which a constant 
amount has been subtracted from each score value, it is necessary 
to add the constant to the obtained mean. The obtained mean in 
Plate 16 is . 


2.6 


36. PLATE 16. The obtained mean is 2.6, 

Remember, you have subtracted a constant of 20 points from each 
score value, so it is necessary to add 20 to the obtained mean in 
order to determine the true mean of the distribution. ju=2.6+20 = 


22.6 


37. To compute the M of a frequency distribution containing grouped 
data, you must first determine the one score value that best repre¬ 
sents each i. The midpoint of each i is used for this purpose. For 
i 16-20, the midpoint is_. 


18 
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38. The midpoint ofi 16-20 is 18; and therefore, 18 is used as the score 
value that best the interval. 


represents 


39. Determine the midpoint of i 1-5. 


3 


40. The midpoint of i 1-5 is 3 because_score values 

lie above it and_score values lie below it. 

1 2 3 4 5 

t 

midpoint 


two, two 

41, The midpoint of i 1-6 is 3.5 because_score values 

lie above it and __score values lie below it. 

123456 

t 

midpoint 


three, three 

42. When the i encompasses an even number of score values, the mid¬ 
point always ends in .5. When the i encompasses an odd number 

of score values, the midpoint is always the_score 

value of the i . 


middle 
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43. PLATE 6, page 14. To determine the jU of the frequency distri¬ 
bution, first determine the midpoint of each i in the distribution. 
The midpoints are_,_,_,_,_,_, and_. 


29, 26, 23, 20, 17, 14, 11 


44. PLATE 6, page 14. The midpoint of each i is used as the X value 
in Formula 2 because these midpoints are the score values that 

best_the score values encompassed by the 

intervals. 


represent 


Plate 17 


i 

X 

/ 

28-30 

29 

2 

25-27 

26 

4 

22-24 

23 

2 

19-21 

20 

6 

16-18 

17 

2 

13-15 

14 

1 

10-12 

11 

2 


N= 19 ZfX = 


46. This plate presents the frequency distribution of Plate 6 and an 
additional column, X, which is the_of each 


midpoint 


46. For grouped data, the midpoint of each interval is used sts (symbol) 
in Formula 2 to determine the mean. 









47. PLATE 17. (1) Determine each/X. 

(2) Determine 2/X. 

(3) Determine M using Formula 2. 


(1) 58, 104, 46, 120, 34, 14, 22 

(2) 398 

(3) 20.9 


48. The method presented earlier for simplifying the arithmetic in¬ 
volved in the calculation of the M was to reduce each score value 
by a constant amount. This can also be done with grouped data by 
reducing the_of each i by a constant amount. 


midpoint 


49. PLATE 17. EachX value could be reduced by a constant of 10, in 

which case the obtained mean would have to be_ 

(increased/decreased) by 10. 


increased 


Plate 18 


i 

X 

(X - 10) 

/ 


fx 

28-30 

29 

19 

2 


38 

25-27 

26 

16 

4 


64 

22-24 

23 

13 

2 


26 

19-21 

20 

10 

6 


60 

16-18 

17 

7 

2 


14 

13-15 

14 

4 

1 


4 

10-12 

11 

1 

2 


2 



N = 

19 

2/X =' 

208 


obtained 

208 

^ U = 

10.9 




true 

IX = 10.9 + 

10 = 

20.9 



50. This plate presents the same data as Plate 17, but each X value has 
been reduced by 10. The reduced X values are presented in the 
third column of the plate. The obtained mean has been increased 
by_in order to obtain the true mean. 


10 
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61, PLATE 18. The ju =_, which is the same as that obtained 

by the longer method when computing from the distribution pre¬ 
sented in Plate 17. 


20.9 


Plate 19 

Raw Scores 


20 

28 

25 

26 

30 

26 

21 

34 

31 

29 

22 

28 

32 

24 



52. Using Plate 18 as a guide, prepare a frequency distribution of the 
raw scores presented in Plate 19. Use an interval of three score 
values each. Simplify your calculations by using a constant of 20. 
Compute the M of the distribution. 


Plate 20 


i 

X 

(X - 20) 

/ 

fx 

32-34 

33 

13 

2 

26 

29-31 

30 

10 

3 

30 

26-28 

27 

7 

4 

28 

23-25 

24 

4 

2 

8 

20-22 

21 

1 

3 

3 




iV'= 14 

S/X = 95 


95 

^ = 14 

+ 20 = 6.79 + 

20 = 26.79 
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EXERCISES 


1. What is meant by the term mean of a frequency distribution. What 
symbol denotes mean? 

2. Determine the midpoint of each of the following intervals. 

i 1-5 i 5-6 i 101-110 i 80-84 

3. Calculate the mean for the following frequency distributions, using 
Formula 2, 


(a) X 

/ 

(b) X 

/ 

(c) i 

/ 

(d) i 

/ 

9 

1 

59 

3 

21-25 

4 

57-60 

2 

8 

2 

58 

4 

16-20 

9 

53-56 

5 

7 

4 

57 

6 

11-15 

10 

49-52 

9 

6 

3 

56 

2 

6-10 

3 

45-48 

3 

5 

2 

55 

2 

1-5 

2 

41-44 

2 


4 2 

4. Simplify the calculation of the mean of the frequency distribution 
presented in 3(b) by subtracting a constant of 50 from each raw 
score. Check your answer with your answer to 3(b). 

5, Simplify the calculation of the mean of the frequency distribution 
presented in 3(d) by subtracting a constant of 40 from each class 
interval. Check your answer with your answer to 3(d). 
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set 4 

COMPARISON OF MEASURES 
OF CENTRAL TENDENCY 

MEASURES OF VARIAIUTTY: perceltiles 


We have just shown how we can simplify our calculations by subtracting 
a constant from each raw score. We can make similar adjustments by 
adding, multiplying, and dividing by a constant, as shown in the first part 
of this set. The procedure of using constants can eliminate computation 
with negative scores. 

The mode, median, and mean are affected differently by changes in 
extreme scores in the frequency distribution. This set will ailso compare 
the ways in which these three measures of central tendency are affected 
by the alteration of extreme scores. 

It is sometimes useful to convert a raw score into a percentile rank 
which tells what percentage of scores lies below that score. The pro¬ 
cedure for obtaining the percentile rank of a raw score in a distribution 
is given, as well as the definition of quartiles and deciles. 

SPECIFIC OBJECTIVES OF SET 4 

At the conclusion of this set you will be able to; 
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(1) simplify the calculation of the mean of a frequency distribution 
containing negative scores, by adding a constant to each score. 

(2) determine the effect of extreme score values on the value of the 
mean, median, and mode. 

(3) define the term percentile . 

(4) use Formula 1 to determine the score which lies at a given 
percentile for grouped data. 

(5) use Formula 1 to determine the score which lies at a given 
percentile for ungrouped data, 

(6) define the terms quartile and decile. 

(7) use Formula 3 to determine the percentile of a specific score 
in a frequency distribution. 

(8) identify the symbols Qi, Q2» ^ 3 ^ A- 
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1. Thus far you have been subtracting a constant from each score value 
in order to simplify the calculation of jit. You may also add a con¬ 
stant. Of course, when you add a constant, you must_ 

the same constant from the obtained mean to determine the true 
mean. 


subtract 


2. You may also multiply each score value by a constant. In this case, 

you must_the obtained mean by the constant in order to 

obtain the true mean. 


divide 


3. If you divide each score value by a constant, you must_ 

the obtained mean by the constant in order to obtain the true mean. 


multiply 


4* You have used only positive numbers thus far. However, you may do 
all statistical calculations with negative numbers as well as positive 
ones as long as you handle the negative numbers properly. Thus: the 
M of score value -19 is -19.5; the M of score value -4 is_. 


-4.5 


6 . Incases where negative numbers make the computation difficult, it 

is desirable to_a constant to the score values in order to 

make them all positive. 


add 
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8. If the score value is -7 and you add a constant of 10, the score value 
becomes 


3 


Plate 21 

Raw Scores 


1 

0 

-1 

4 

-2 


2 

2 

1 

-1 

3 

1 


7. Prepare a frequency distribution, adding a constant of 3 to each 
score value, and compute the jK. 


Plate 22 


X 

4 

3 

2 

1 

0 

-1 

-2 


{X + 3) f fX 

7 17 

6 16 

5 2 10 

4 3 12 

3 13 

2 2 4 

111 


AT = 11 S/X = 43 


obtained fx =3.9 


true ju =3.9-3 = .9 
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8. PLATE 22. The constant of 3 was subtracted from the obtained /x 
because it was _________ as a constant to each of the score values. 


added 


9, PLATE 22. The mode of the distribution is 


1 


10. PLATE 22. 1 is the mode because it is the most 
score value in the distribution. 


frequent 


11. PLATE 22. Determine the Mdn of the distribution, using Formula 1. 
(Do not use the (X-f 3) column. Instead use the original X values.) 


Plate 23 

= .5 . 1 - 1 


12, The Mdn is the score value above which and below which_ % of 

the raw scores lie. 


50 
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Plate 24 


(a) 


(b) 



X 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 


/ 

1 

0 

0 

0 

0 

0 

2 

2 

3 

4 
2 

_ 1 

iNr= 15 


13. Compare the two distributions presented in Plate 24. The top score 

of 7 in (24a) has been changed to_in (24b). All the other score 

values and their/^s have remained the same. 


12 


14. PLATE 24. The mode for (24a) is_. This is __ 

(the same as/different from) the mode for (24b). 


3, the same as 


16. PLATE 24. The Mdn of (24a) and (24b) is_ (the 

same/different) because the score value which contains the Mdn is 
_(the same/different). 


the same, the same 
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16. PLATE 24. Of the three measures of central tendency, the only one 
affected by the change in the raw score of 7 in (24a) to 12 in (24b) 
is the_(symbol). 




17. Consider the raw scores: 7 8 9 9 9 10 11. 

The mode of this set of raw scores is . The Mdn is The m is 


9, 9, 9 


18, Now if the score of 11 becomes 20; that is, if 7 8 9 9 9 10 11 

becomes 7 8 9 9 9 10 20, the mode_ 

(is the same/becomes larger/becomes smaller), the Mdn _ 

_(is the same/becomes larger/becomes smaller), the /x 

__ (is the same/becomes larger/be¬ 
comes smaller). 


is the same, is the same, becomes larger 


19, Because the Mdn is defined as the point above which and below which 

50% of the scores lie, it_(is/is not) influenced by the 

numerical value of the scores above or below it. 


is not 


20, If you add 10 points to a score above the Mdn, it 
(will/will not) change the value of the Mdn, 


will not 
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21. Since the [x is determined by summing the raw scores and dividing 
by AT, if you add 10 points to a score above the jit, the value of the 

fjL will_(be decreased/be increased/remain 

the same). 


be increased 


22. The_is affected by the value of each of the raw scores, 

whereas the_is concerned only with the ordering of 

the raw scores. The_is concerned only with which score 

occurs most frequently. 


mean, median, mode 


23. You may desire to describe a point in the frequency distribution in 
terms of the percentage of scores falling below it. The term 
percentile is used to describe the score below which a given 
_of the total number of scores lie. 


percentage 


24. Thus, the 20th percentile is the score below which 20% of the scores 
lie. 40% of the scores lie below the score that is at the 


40th percentile 
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26. Because the Mdn is the score value below which 50% of the scores 

lie, it is also termed the__of the 

distribution. 


50th percentile 


Plote 25 


^ / 

26-30 3 

21-25 3 

16-20 5 

11-15 4 

6-10 3 

1-5 2 


Ar= 20 

26. If you wish to determine the score value that represents the 40th 
percentile, you must first determine how many scores are 40% of 
the total number of scores. In this case, AT = 20. 40% of 20 is 


8 


27. PLATE 25. Eight scores represent 40% of the total number of 
scores. Starting from the bottom of the distribution, sum the/ 
column until you determine which i contains the eighth score from 

the bottom, i - _contains the eighth score from the bottom 

of the distribution. 


11-15 
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28. PLATE 25. The eighth score, which is at the 40th percentile, is 
contained in i 11-15. To determine the exact score value of the 
40th percentile, you may use Formula 1. However, you are now 
seeking the 40th percentile instead of the 50th percentile {Mdn) so 
you must change . dN to ._ N. 


A 


29. Substituting the value AN for the 40th percentile, and the term "40th 
percentile" for the sjrmbol Formula 1 reads: 

AN - Ifb 

fw 

PLATE 25. Determine the 40th percentile. Recall that it is located 
in i 11-15, 


40th percentile - M + 


Plate 26 


40th percentile = 10.5 + 


(^ 4 ^ 2^)5 = 14.25 


30. Write the formula for determining the 25th percentile of a frequency 
distribution. Making the necessary changes, use Formula 1 as a 
guide. 


/,25N- Xfh \ . 
25th percentile - M + (- y —^ 


31. Formula 1 can be adapted for use in determining any percentile 
as long as you use the decimal form of the percentage when mul¬ 
tiplying by the (symbol). 


N 
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32. In determining a percentile for Plate 25, you were using a frequency 
distribution for_(grouped/ungrouped) data. 


grouped 


33. When determining percentiles for a frequency distribution of un¬ 
grouped data, the i in Formula 1 is equal to_. 



34. PLATE 3. Determine the 30th percentile of the distribution. (Substi¬ 
tute the correct numerical values for this calculation in Formula 1.) 


Plate 27 . 

30th percentile = 100.5 + (— ~ ^ ) 1 = 1 ^ 1.1 

35. The most commonly used percentiles are the 25th percentile, the 
50th percentile, and the 75th percentile. The score at the 25th per¬ 
centile is known and the 1st quartile(symbol Qi) because one quarter 
of the scores lie_it. 


below 

36. The 2nd quartile (symbol is the point below which 50% of the 

scores lie, and is also called the_(symbol). 

Mdn 

37. The 3rd quartile (symbol Q 3 ) is the score at the_^percentile, 

and_% of the scores lie below it. 


75th, 75 
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88 . The symbol for the 1 st quartile is The symbols for the 2nd and 
3rd quartiles are_and_, 


Q2> Qz 


39. 25% of the scores lie between Qi and $ 2 * _scores lie 

between Q 2 ^ 3 * _of the scores lie between Qi and Q 3 . 


25, 50 


40. The median lies at the_percentile which is also_(use 

symbol for the correct quartile). 


50th, Q 2 


41. In addition to quartiles, percentiles are sometimes referred to in 
terms of deciles. Since the word "decimal" means 10, the 1st decile 
must lie at the_percentile. 


10 th 


42. The symbol used for decile isZ?. represents the 1st decile, or 
10 th percentile. Z >2 represents the 2 nd decile, or 20 th percentile. 

The symbol for the 4th decile is_(symbol) and it lies at the 

_percentile. 


D 4 , 40th 


43. The 50th percentile lies at the same score value as_(give symbol 

for the correct decile). It also lies at the same score value as_ 

(give symbol for the correct quartile) and is known as the_ 

(give symbol for the correct measure of central tendency). 
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44. Formula 1 permits you to determine the score value that lies at a 
particular percentile. When you wish to determine the percentile 
at which a particular raw score lies, you use Formula 3. In Formula 
3 the score value for which the percentile is to be computed is 
represented by_(symbol). 


X 


46. To use Formula 3 in determining the percentile of a particular score, 
first you must determine the interval within which the score lies. 

PLATE 25. The raw score of 22 lies in i _- 


21-25 


46. 


PLATE 25. To determine the percentile value of raw score 22, which 
lies in i 21-25, substitute the numerical values for the symbols in 
Formula 3. 


Percentile in 
decimal form 



) 


+ 


Plate 28 

( 22^)3 , 3 , 

Percentile in decimal form =-on-- 


47. PLATE 28. Do the necessary arithmetic to determine the percentile 
in decimal form of raw score 22. 


.745 


48. PLATE 25. .745 is the decimal form of the percentile value of raw 
score 22. To convert the decimal form into a percentage, you must 

multiply the decimal form by_. Raw score 22 lies at the 

_percentile of the distribution. 


100, 74.5th percentile 
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49. PLATE 6 , page 14, Determine the percentile value of raw score 20. 
(Substitute, in Formula 3, the correct numerical values for this 
calculation.) 


Plate 29 


Percentile in decimal form 



- 18,5 
3 


6 + 5 


19 


= .42 


Percentile = .42(100) 

= 42nd percentile 


60, The percentile value of any particular raw score indicates the per¬ 
centage of scores falling_this particular raw score. 


below 


61, The percentile at which a particular raw score lies will tell you the 
_of scores lying below it. 


percentage 


62. A raw score of 97 has no particular meaning because it tells you 
nothing about its position relative to the other raw scores in the 
distribution. However, to say that it is at the 75th percentile, or 

at Q 3 , tells you that below it lie_ % of the raw scores and above 

it lie % of the raw scores. 


75, 25 
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EXERCISES 


1, Calculate the jut of the following distribution. Simplify your calcu¬ 
lations by adding a constant of 5. 


^ / 

2 2 

1 3 

0 5 

-1 4 

-2 3 

-3 3 

-4 1 


2. Which measure or measures of central tendency are affected by 
altering the values of extreme scores in a frequency distribution? 
Which are not affected ? 

3. If John Jones has a language achievement score at the 80th per¬ 
centile, how can you interpret his score in respect to the total group? 

4. What percentile lies at the first quartile? The second quartile? 
The third quartile ? What symbols denote these quartiles ? 

5. What percentile lies at the first decile? The fourth decile? The 
sixth decile? What symbols denote these deciles? 

6 . For the following distributions, determine the score values which 
lie at the 40th percentile, at Q 3 , at Dg. Use the modification of 
Formula 1 for your calculations. 


(a) X 

/ 

(b) ^ 

/ 

15 

1 

17-20 

4 

14 

2 

13-16 

6 

13 

4 

9-12 

9 

12 

6 

5-8 

4 

11 

3 

1-4 

2 

10 

3 




9 1 


7. Determine the percentiles of score values H, 13, and 14 in Exercise 
6 (a), using Formula 3. 

8 . Determine the percentiles of score values 6 , 12, and 17 in Exercise 
6 (b), using Formula 3. 


63 



set 5 

MEASURES OF VARIABILITY; range 
DISTRIBUTION CURVES 


The score value which represents the central tendency of a frequency 
distribution tells us nothing about the spread of the scores around that 
point. In addition to the measure of central tendency, we need a measure 
of variability that describes dispersion of the raw scores. In this set 
you will be introduced to several methods of describing the variability 
of scores. There will be a discussion of the concept of variability and 
a definition oirangey interquartile range, and semi-interquartile range. 
The frequency polygon is given as another method of describingvariability 
for both grouped and ungrouped data. Here you will be introduced to the 
concept of a ^’smoothed” curve and given definitions of bi-modal and 
uni-modal distributions; and symmetrical, negatively skewed,and posi¬ 
tively skewed distributions. You will learn how the mean, median, and 
mode are related to the shape of the frequency polygon and how they are 
affected by the skewness of the distribution. 

SPECIFIC OBJECTIVES OF SET 5 

At the conclusion of this set you will be able to: 
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(1) calculate the range of a frequency distribution using Formula 4. 

(2) calculate the interquartile range in a frequency distribution. 

(3) calculate the semi-interquartile range in a frequency distribution 
using Formula 5. 

(4) determine the effect of extreme score values on range and Q. 

(5) draw a frequency polygon of a frequency distribution. 

(6) identify bi-modal, uni-modal, symmetrical, and negatively and 
positively skewed frequency polygons. 

(7) describe the relationship of mean, median, and mode to the shape 
of a frequency polygon. 
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1. To say that a group of data has a M of 19 or a Mdn of 27 gives you a 
representative score for the data but tells you nothing about the 
spread of the scores, or the extent to which they vary among them¬ 
selves. To describe a set of scores fully, you need both a repre¬ 
sentative score and a measure of the extent to which they_. 


vary 


2. The simplest measure of the variability of a group of scores is the 
range. The range is simply the number of score values encompassed 
by the data. 

PLATE 15, page 42. The score values encompassed are 25, 24, 
23, 22, and 21. The range for this distribution is_score points. 


5 


3. Formula 4 presents the formula for the calculation of the range. 
Using this formula, calculate the range of a frequency distribution 
having 10 as the lowest score value and 25 as the highest score 
value. 


25 - 10 + 1 = 16 


4, PLATE 6, page 14. The range for this frequency distribution of 
grouped data is 30 - 10 + 1 = 21. 

PLATE 7, page 19 . The range is_. 


30 


6. The range is not a very useful measure because it tells you nothing 
about the distribution of the raw scores. It tells you only the number 
of score values within which all the lie. 


raw scores 
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6 . Other, somewhat more useful, measures of the variability of a group 
of raw scores are the interquartile rsnge and semi-interquartile 
range. The interquartile range is determined by the difference be¬ 
tween Qi and Q 3 which encompasses_ % of the scores. 


50 


7. Because the prefix "semi-*' means half, in order to determine the 
semi-interquartile range of a frequency distribution, you must di¬ 
vide the interquartile range by_. 


2 


8, Formula 5 presents the formula for the calculation of the semi- 
interquartile range. The symbol commonly used for this measure 
is Q, The Q for a distribution in which = 12 and Q 3 = 28 is_. 


8 


9* When describing a frequency distribution, Q is often reported along 

with the_(symbol for a measure of central tendency) because 

both of these measures are derived from score values that lie at 
percentiles in the distribution. 


Mdn 


10. If you are told that Mdn = 25 and Q = 4, you would know that_% of 

the scores lie above and below score 25, and that the middle_%of 

the scores lie within a range of 8 score points. 


50, 50 
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11 , You are given this information: lowest score is 105; highest score 
is 135; Mdn = 120; Q = 7. Range =_. 50% of the scores lie be¬ 
tween 105 and_. 50% of the scores lie between_and 

135, The middle 50% of the scores lie within a range of_score 

points. 


31, 120, 120, 14 


12 » You are given this information: Qi =30; Q 2 =59; Q 3 =85, 

_% of the scores lie below the score of 85, The midpoint of the 

distribution is at score_, Q =_. 


75, 59, 27.5 


13 . The range and Q do not take into consideration the value of all of the 
raw scores in a distribution. The range considers only the upper and 
lower scores. Q considers only the middle_% of the scores. 


50 


14 , If the uppermost score in any distribution is increased by 10 points, 

the range_(is increased/is decreased/ 

remains the same). Q _(is 

increased/is decreased/remains the same). 


is increased, remains the same 
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16* In order to get the total picture of a frequency distribution, it is 
desirable to depict it in graphic form. Plate 30 presents a frequency 
distribution and its graphic representation. Notice that the score 

values are presented along the_(horizontal/ 

vertical) axis. 

Plate 30 


Frequency 

Distribution Frequency Polygon 



horizontal 


16 • PLATE 30. The frequencies of the score values are presented along 
the_(horizontal/vertical) axis. 


vertical 


17. From Plate 30 you learn that a frequency distribution presented in 
graphic form is called a_. 


frequency polygon 


p - 
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18. From Plate 30 you learn that, along the horizontal axis, the score 

values are presented with the lowest value at the_(left/ 

right) and the highest value at the_(left/right). 


left, right 


19. From Plate 30 you also learn that, along the vertical axis, the/*s 

of the score values are presented with the lowest / at the_ 

(bottom/top) of the axis and the highest / at the_ (bottom/top) 

of the axis. 


bottom, top 


20. Thus, the lowest values of both the horizontal and vertical axis in a 

frequency polygon are located in the_corner of 

the graph. 


lower left 


21. PLATE 30. The/for each score value is indicated by a dot. The/ 
for score value 86 is 4, so the dot for this score value is placed 

directly above score value 86 and directly_from/4. 

(This is illustrated by the dotted lines in the plate.) 


across 


22. PLATE 30. When the dots that represent the/^s of each score value 
are connected by a line, this line gives the "shape^’ of the distribu¬ 
tion. The mode is score value_because it has the highest peak 

in the polygon. 


86 
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23. PLATE 30, The frequency polygon depicts the score value of 83 and 
91 as having a frequency of_. This is customary when pre¬ 

senting a frequency polygon so that the extremes of the distribution 
are positively identified. 


zero 


24. PLATE 30. Although the score values of 83 and 91 are listed on the 
horizontal axis of the polygon, the actual range of scores in the dis¬ 
tribution is from to 


84, 90 


Plate 31 



X 

26. In this frequency polygon, the/of score value 103 is 4 because the 
dot representing it lies directly across from the 4 in the vertical / 

axis. The / of score value 101 is_. The / of score value 108 

is 


2 , 0 


26. PLATE 31. The mode is score value 


104 


71 







27. PLATE 31. The mode is score value 104 because it has the largest 
_(symbol), which is depicted by the peak in the frequency polygon. 


/ 


28. PLATE 31. The lowest score value that has an/is_. The 

highest score value that has an/is_. Therefore, the range 

of score values is from_to_. Using Formula 4, deter¬ 

mine the range of the distribution. 


101, 107, 101 to 107 
Range = 7. 


29. PLATE 12, page 32. Prepare a frequency polygon for this frequency 
distribution. Use Plate 31 as a guide. 


Plate 32 
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30. Plate 33 presents a frequency polygon for a frequency distribution 
of grouped data. For grouped data, the designation along the hori¬ 
zontal axis is for each_(symbol) instead of for each X as in un¬ 

grouped data. 


Plate 33 



i 


i 


31. PLATE 33. The vertical axis is marked off in steps of_ f^s each. 

Where the/for a particular score value is between the/markings 
on the vertical axis, the dot representing the/for that score value 
is placed_the markings. 


5, between 


32. PLATE 33. The/for i 11-20 is_because it lies not quite halfway 

between the/markings of 5 and 10. 


7 


33. PLATE 33. The/for t 21-30 is 


10 
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34. PLATE 6, page 14. Prepare a frequency polygon for this frequency 
distribution. 


Plate 34 



i 


35. When there are many different score values involved in a frequency 

polygon, and the N is quite large, the line that connects the_’s 

(symbol) of the various score values becomes a smooth ^’curve.*’ 
Plate 35 depicts a smooth ^’curve." 

Plate 35 
A Smooth Curve 

f 


X 



f 
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36, A number of types of curves can be drawn from frequency distri¬ 
butions . 

Plate 36 

A Curve of a Bi--modal 
Distribution 

f 


X 



This curve is for a bi-modal distribution because two score values 
have the frequency of 7, which means there are two of 

the distribution. ’ 


modes 


Plate 37 

A Curve of a Negatively 
Skewed Distribution 



37. Another type of curve is depicted in Plate 37. This curve is said to 
be skewed to the left because the slope of the curve trails off to the 

_• This is sometimes referred to simply as a skewed dis- 

tribution. 


left 
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38 • PLATE 37. This curve, which is skewed to the left, is said to be 
negatively since the "tail" of the curve trails off to the end 

of the horizontal axis that represents the_(lower/ 

higher) score values. 


lower 


Plate 38 


A Curve of a Positively 
Skewed Distribution 



39, This is also skewed distribution. The frequency polygon is skewed 
to the_because the slope of the curve trails off to the 


right, right 


40. PLATE 38. This curve is_(positively/nega¬ 

tively) skewed because the slope of the curve trails off to the right. 


positively 


41. A good way to determine whether a curve is negatively or positively 
skewed is to remember that if there were negative score values in 

the distribution they would be at the_(left/right) end of 

the horizontal axis. 


left 
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Plate 39 


A Curve of a Symmetrical 
Distribution 



Mdn 

Mode 


42, Plate 39 depicts a frequency polygon that is symmetrical. That is, 
if the curve is divided in half by a line drawn perpendicular to the 
horizontal axis, the two halves are identical in shape. This is called 
a_distribution. 


symmetrical 


43, PLATE 39. The perpendicular line marks the peak of the distri¬ 
bution; therefore, it designates the_(measure of central 

tendency) of the distribution. 


mode 


44, The distributions shown in Plates 37, 38, and 39 are all called uni’- 
modal distributions because they all have only_ mode. 


one 


45, In all uni-modal symmetrical distributions, such as in Plate 39, the 

three measures of central tendency—the mode, the_ (symbol), 

and the (symbol)—all coincide at the same score value. 


Mdn , ju 
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Plate 40 


A Curve of a Positively 
Skewed Distribution 



46, Plate 40 presents a frequency polygon of a positively skewed dis¬ 
tribution. When a distribution is skewed, the mean, median, and the 
mode_(do/do not) coincide. 


do not 


47. PLATE 40. The mode, of course, occurs at the peak of the curve. 

Because the_is affected by the extreme score values that 

occur at the tail of a skewed distribution, it will always fall some¬ 
where between the mode and of the distribution. 


mean, tail 


48, PLATE 40. In a skewed distribution, the Mdn is not as affected by 
extreme scores as the therefore, it lies somewhere between the 
and the of the distribution. 


mode, mean 


49. To summarize, the order of the three measures of central tendency 
for a uni-modal skewed distribution, starting from the peak and 
proceeding toward the tail, is_,_, and_. 


mode, median, mean 
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50. The measure of central tendency that is to be reported depends upon 
the purpose to which it is to be put. For instance, if you want to 
report the value of the most typical score, you would report the 


mode 


61. If you wish to report which score lies at the center or midpoint of 
the distribution, you would report the_(symbol). 


Mdn 


52, Of the three measures of central tendency, the mean is the most 
stable, because it is computed by using the value of each raw score 

in the distribution, whereas the_and the_are 

not. 


mode, median 


63. The mode and the median are known as "terminal statistics" because 
they do not lend themselves to further statistical manipulations. 

However, the_is the most stable and useful statistic and 

is used in all further calculations involving a measure of central 
tendency. 


mean 
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EXERCISES 


1. Draw frequency polygons for the following frequency distributions. 


(a) X 

/ 

(b) i 

/ 

(c) ^ 

/ 

(d) i 

/ 

20 

2 

36-40 

2 

106 

1 

105-108 

4 

19 

4 

31-35 

5 

105 

0 

101-104 

6 

18 

4 

26-30 

4 

104 

3 

97-100 

9 

17 

7 

21-25 

3 

103 

4 

93-96 

10 

16 

6 

16-20 

5 

102 

5 

89-92 

9 

15 

3 

11-15 

2 

101 

7 

85-88 

6 

14 

2 

6-10 

1 

100 

5 

81-84 

4 

13 

2 








2. For each of the above frequency distributions, calculate the range 
using Formula 4, and the semi-interquartile range using Formula 5. 

3. Which frequency polygon(s), prepared for Exercise 1, can be termed 
uni-modal distribution(s)? 

4. Which can be termed bi-modal distribution(s)? 

5* Which can be termed skewed distribution(s)? Is the skewness to 
the left or to the right? 

6. Which can be termed symmetrical distribution(s)? 

7. (a) If the highest score in a frequency distribution is reduced in 

value, but still remains the highest score, will the range of the 
distribution be affected? If so, how? 

(b) Will the semi-interquartile range be affected? If so, how? 

8. Which measure(s) of central tendency is most affected by the degree 
of skewness in a frequency distribution? Which is least affected ? 
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set 6 

MEASURES OF VARIABILITY: average deviatloi 
NORMAL DISTRIBUTION 


The interquartile and semi-interquartile ranges give some measure of 
the spread, or variability, of scores. However, these measures are 
usually too gross for the statistician, who needs a more refined measure 
of variability—one which takes into account the actual score value of each 
of the raw scores. This set will present the meaning of an absolute 
deviation score and the method of calculating the average deviation. 
It will compare the average deviation with the range and semi- 
interquartile range. In this set, you will also be introduced to a very 
important concept in statistics—that of the normal distribution. You 
will see the relationship between the mean, median, and mode in a normal 
distribution. This set will also introduce the standard deviation—a very 
important measure of variability—and will relate it to percentages of 
scores in the distribution. 

SPECIFIC OBJECTIVES OF SET 6 

At the conclusion of this set, you will be able to: 
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(1) calculate positive and negative deviation scores, using Formula 6. 

(2) describe the meaning of the term absolute deviation, 

(3) calculate the average deviation of a distribution, using Formula 7. 

(4) state the relationship of the mean, median, and mode in a normal 
distribution. 

(5) determine the percentage of scores lying between various stand¬ 
ard deviation values. 

(6) identify the symbols x, |x|, and a. 



1, Two sets of data may have the same fJi, but the spread—or variability 
—of the raw scores around the ju may be quite different. 

Plate 41 




The range of scores in (a) is_(larger/smaller) than 

the range of scores in (b). 


larger 


2. The simplest measure of the variability of the raw scores in a fre¬ 
quency distribution is the_, which indicates the spread of 

the scores. 


range 


3. The semi-interquartile range is another measure of the 
_of the raw scores in a distribution. 


variability/spread 
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4. A more useful measure of variability that indicates how a set of raw 
scores vary from the mean is called the average deviation. This is 

simply the_amount that the raw scores deviate from 

the mean. 


average 


5. To determine the average deviation ot the raw scores from the mean, 

you must first determine the extent to which each raw score_ 

from the mean. 


deviates 


6, The amount that a raw score deviates from the mean is the difference 
between the value of the raw score and the value of the 


mean 


7. The symbol x represents the amount of deviation of a raw score 
from the mean. The formula for the deviation of one X from the M 

is presented in Formula 6, in which X is the value of the _ 

_and jLX is the value of the_. 


raw score, mean 


8 . FORMULA 6. If X = 17 and ju = 14, then x = 3 because x - X - ix. 
If A" = 20 and ju = 15, then x =_. 


5 


9, FORMULA 6. If A' = 20 and [x = 25,then x = -5 because x - 20 - 25 
= -5. IfX = 70 and ju = 80, then x =_. 


-10 
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10. X is called a ^deviation score" because it represents the amount of 
_of a particular X from the /x. 


deviation 


11, Using Formula 6, the deviation scores for the raw scores that are 

smaller than will be___ (positive/negative). The 

deviation scores for the raw scores that are larger than jit will be 
__(positive/negative). 


negative, positive 


12. Because the fJi is the arithmetic average of all the raw scores in a 
distribution, it follows that the sum of the negative deviation scores 
is_to the sum of the positive deviation scores. 


equal 


13. If the sum of the negative deviation scores is equal to the sum of the 
positive deviation scores, then the sum of all the deviation scores 
must be . 


zero 


14. If the sum of the negative deviation scores is equal to the sum of the 
positive deviation scores, the average of the sum of all deviation 
scores in a distribution is also equal to_. 


zero 


16. When you do not consider the sign of a deviation score, but only its 
value, it is called an absolute deviation score. The average deviation 
is the sum of the absolute deviation scores divided by_(symbol). 


N 
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16, When you do not consider the sign of a deviation score, but only its 
value, it is called an_deviation score. 


absolute 


17. The symbol for the absolute deviation score is l:x:l. If X = 15 and 
jLt = 10, then 1:^1 = 5, If X = 45 and jjl = 41, then \x\ = _. 


4 


18, Formula 7 presents the formula for the calculation of the average 
deviation. The symbol X\x\ in this formula means that you sum the 
_ deviation scores. 


absolute 


Plate 42 

X 

/ 

V 

fx 

X 

kl 

/ \x\ 

12 

1 

12 

3 

3 

3 

11 

2 

22 

2 

2 

4 

10 

4 

40 

1 

1 

4 

9 

5 

45 

0 

0 

0 

8 

2 

16 

-1 

1 

2 

7 

3 

21 

-2 

2 

6 

6 

1 

6 

-3 

3 

3 


18 

S/X= 162 



2 /lxl = 22 


fji - d 


19. Plate 42 presents a frequency distribution. The fourth column pre¬ 
sents the deviation score for each score value. The fifth column 
presents the absolute deviation score values, which are the same as 

the deviation score values except that they are all_ 

(positive/negative). 


positive 
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20. PLATE 42. ju = 9. Using Formula 6, for the score value of 12 the 
deviation score value is: 

jv=X-iLt = 12-9=3 

The deviation score values appear in the fourth column of the plate. 
The deviation score value of score value 7 is . 


-2 


21. PLATE 42. The absolute deviation score values that appear in the 

fifth column are the same as those in the_(symbol) column except 

that the negative signs have been omitted. 


X 


22. PLATE 42. The /lx I for each score value is computed by mul¬ 
tiplying the absolute deviation score value lx I by its_(symbol). 


/ 


23. PLATE 42. 2/1x1 is obtained by_the/|xl for all 

score values. 


summing 


24. PLATE 42. 2/1x1=22, Ar= 18. 

Using Formula 7 for the calculation of the Average Deviation, substi¬ 
tute these values for the symbols and determine A.D. 


22 

A.D. = TT == 1.22 

lo 

25. PLATE 42, The Average Deviation = 1.22. Therefore, the average 
number of score points that the raw scores in the distribution deviate 
from the ________ is 1.22. 


mean 
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26. The Average Deviation takes into account the numerical values of the 
raw scores, whereas the range or semi-interquartile range does 

not. Therefore, the Average Deviation is a_(better/ 

worse) indicator of the variability of the raw score. 


better 


27. Using Plate 42 as a guide, determine the Average Deviation (A.D.) 
of the frequency distribution presented in Plate 9. (Page 25.) 


Plate 43 








X 

f 


fx 

X 

\x 1 


/ 

90 

3 


270 

2 

2 


6 

89 

3 


267 

1 

1 


3 

88 

1 


88 

0 

0 


0 

87 

4 


348 

-1 

1 


4 

86 

1 


86 

-2 

2 


2 

85 

1 


85 

-3 

3 


3 

N = 

13 

s/x = 

1144 



2/UI 

= 18 



= 88 



A.U. - 3^3 

= 1.4 



28. Because the A,D. gives you only a measure of how much, on the 

average, the raw scores in a distribution deviate from the_, 

it is not as useful as the next measure of variability to be considered. 


mean 


29. The frequency distribution represented by the curve in Plate 44 
is called di normal distribution because the measurements of many 
traits that occur in nature are normally distributed in this manner. 
The curve of such a distribution is called a normal curve and is a 

bell-shaped, symmetrical one in which the_,_, 

and all coincide. 


mean, median, mode 


88 








30. If you measured the heights of all seventh-grade children, the fre¬ 
quency distribution of their heights would be in the form of the normal 
distribution and the frequency polygon would be in the shape of the 
_curve. 


normal 


81. The curve which represents the weights of a population of pygmies, 
or the IQ scores of all twelve-year-old children, would resemble the 
curve in Plate 44 and would be called a __. 


normal curve 


32. Because so many types of data are distributed in the. form of the 
normal curve, mathematicians have derived certain functions of 
the curve that are helpful to the statistician. The jn and the Mdn 

coincide; therefore,_ % of the scores lie to the right of the ix 

and_ % of the scores lie to the left of the ju. 


50, 50 


Plate 44 

Normal Curve 



33. The area under the curve represents the total frequencies of the 

score values. Thus,_ % of the area under the curve lies to the 

left of the ju and_ % of the area under the curve lies to the right 

of the M in a normal distribution. 


50, 50 
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34, The standard measure of the variability of the raw scores in a normal 
distribution is called a standard deviation. This is a measure of the 
extent to which the scores_from the mean. 


deviate 


36. The symbol for the standard deviation is the small Greek letter a, 
which is pronounced "sigma.” Recall that the capital Greek letter S 
is also pronounced "sigma" and is the symbol meaning _ 

IT 


the sum of 


36. The standard deviation, o’, is always measured from the [JL of the 
distribution because it is a measure of the extent to which the raw 
scores_from the jit. 


deviate 


Plate 45 

Normal Curve 



37. A line is dropped from the point of inflection in the normal curve. 
The point at which this line meets the horizontal axis is desig¬ 
nated as lying one standard deviation from the jii. The symbol used 
to designate this point is_. 


Id 
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38. The symbol Icr means that this point on the horizontal axis is located 
_standard deviation from the M. 


1 


89. Plate 46 has Icr marked on the right side of the ju. Because the 
curve is symmetrical, the same distance to the left of the fx is desig¬ 
nated by the symbol_and is located minus one standard de¬ 

viation from the ju. 


Plate 46 

Normal Curve 



Below the Mean Above the Mean 


-lo* 


40. PLATE 46. The point designated by Icr is said to be one standard 
deviation above the mean because it is located on the right of the 
IX where the score values are larger than the jLi. The same point to 
the left of the ju is designated -lo* and is said to be one standard 
deviation_the ju. 


below 


41. PLATE 46. The distance between the ju and lor encompasses 34.13% 
of the area under the curve. Because the area under the curve 
represents the frequency of the scores in the distribution, it can be 

said that_% of the scores are contained between theju 

and lor. 


34.13 
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42, PLATE 46, Because the distribution is symmetrical, the distance 
between the ju and -Icr also encompasses_ % of the scores. 


34.13 


Plate 47 



NOTE: Figures refer to percentages of total area 
bounded by a normal curve. 

43. Plate 47 indicates how much of the area of the normal curve is in- 
eluded between any two perpendiculars indicating standard devia¬ 
tion units. The distance along the horizontal axis from jU to 1<j is 
equal to the distance between lo* and 2or. This is the same distance 
as between 2<j and 


3cr 


44. PLATE 47. The standard deviation designations to the left of the ju 
are exactly equal to those on the right of the ju and the proportion 
of scores contained in the areas to the left and right of fx are also 
exactly the_. 


same 


46. PLATE 47. The percentage of scores that lie between jJi and Icr is 
34.13%, but the percentage of scores that lie between la and 2a is 
only_%. 


13.60 
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46 , The percentage of scores that lie between /x and 2cr is obtained by 
summing the percentages contained between these two points: 
34.13% + 13.60% =_%. 


47.73 


47 . PLATE 47. Between -lor and la are contained_% of the 

scores. 


68.26 


48 . PLATE 47. The percentage of scores that lie between -3a and 
3a is_%. 


99.78 


49 . The mathematically derived curve of a normal distribution never 
touches the horizontal axis. However, for practical purposes very 
few scores lie outside of -3a and 3a. 

PLATE 47. _ % of the scores lie outside of -3a and 3a. 


.22 


50. Remember, the normal curve is not a frequency polygon of an actual 
set of data, but is a mathematically derived curve. The properties 
of the normal curve are very useful to statisticians because the 
frequency distributions of many sets of data resemble the 
_curve. 


normal 


61 . Knowing the ijl and a of a set of data that is normally distributed 
provides you with sufficient inforniation to describe fully the set of 
data. For example, in a set of spelling scores for all seventh-grade 
children, /n = 80, a = 5. Thus, la above the /x is 5 points above score 
value 80, which is score value_. 


85 
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Plate 48 


Frequency Distribution of 
Spelling Scores 



Standard Deviations 
Spelling Scores 


52. Plate 48 presents the normal distribution of spelling scores where 
jLt = 80 and or = 5. The score value of 80 is placed at the jjl, and Icr is 
5 points above the ju, at score value 85. -lo* is 5 points below the fx 
at score value_. 


75 


63. Refer to Plate 47 whenever you need the properties of the normal 
distribution. 

PLATE 48. Between the fx and Icr are_ % of the scores. 

Therefore, if the ix = 80 and c = 5, then_ % of the children 

received scores between 80 and 85. 


34.13, 34.13 


54. PLATE 48. 34.13% of the children received scores between the 

values of 80 and 85. _ % of the children received scores 

between the values of 85 and 90._ 7o of the children re¬ 

ceived scores between the values of 80 and 90, 


13,60 

47,73 (34,13% + 13.60%) 


94 








65. PLATE 48. _% of the children received scores between the 

values of 75 and 85. 


68.26 (34.13% + 34.13%) 

66. PLATE 48. _% of the children received scores between the 

values of 70 and 90. 


95.46 

(13.60% + 34.13% + 34.13% + 13.60%) 

67. PLATE 48. _ % of the children received scores lower than 65. 


.11 

68. PLATE 48, _% of the children received scores outside of score 

values 65 and 95. 


.22 


58. PLATE 48. IfiV = 300, the number of children receiving scores 
between the jJi and Icr would be 34.13% of 300, which is 102.4 scores. 
This means that a frequency of 102.4 scores lies between.the score 
values of and 


80, 85 


48. If AT = 300, the number of children that received scores 
he values of 75 and 85 would be 


205 (34.13% + 34.13% = 68.26%) 

(68.26% X 300 = 204.78, 
or approximately 205) 
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EXERCISES 


1. What do the symbols x and \x\ represent? 

2. What does the term "absolute deviation score" mean? How is it 
calculated? 


3. For the following frequency distributions, calculate the positive and 
negative deviation scores, using Formula 6. Calculate the average 
deviation for each distribution, using Formula 7. 


(a) X / 

10 1 

9 2 

8 5 

7 4 

6 3 

5 3 

4 1 


(b) X / 

51 1 

50 1 

49 1 

48 4 

47 3 

46 2 

45 2 


4. What is the position of the jii, Mdn, and mode in a normal distribution? 


5. What does the symbol cr represent? 

6. Using Plate 47, answer the following questions. 

(a) What percentage of scores lies between ju and lex? 

(b) What percentage of scores lies between -la and la? 

(c) What percentage of scores lies outside -2a and 2a? 

(d) What percentage of scores lies above 3a? 
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set 7 

MEASURES OF VARIABILITY: standard deviation 


In the last set a measure of variability called the standard deviation 
was introduced and related to the distribution of raw scores in a fre¬ 
quency distribution. In this set, the formula and method for calculating 
the standard deviation for a set of raw scores will be presented. 

The degree to which any raw score deviates from the mean may be 
expressed in terms of its standard deviation from the mean. Thus, the 
raw score can be expressed as a "standard normal deviate." This term 
is defined in this set, and we will show how it is used to ascertain the 
proportion of scores falling between any specific values in a frequency 
distribution. You will also be introduced to the table of normal curve 
functions and given some practice in its use. 

SPECIFIC OBJECTIVES OF SET 7 

At the conclusion of this set you will be able to: 

(1) use Formula 8 to calculate the standard deviation of a frequency 
distribution by the deviation score method. 
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(2) use Formiila 9 to calculate the standard deviation of a frequency 
distribution by the raw score method. 

(3) use Formula 10 to calculate the standard normal deviate for 
specific scores in a frequency distribution. 

(4) use Table 1 to determine the proportion of scores between two 
points in a normal distribution. 

(5) identify the symbol z. 
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1. Formula 8 presents the formula for the calculation of the standard 
deviation. You will notice that x is used in the formula. Recall 

from Formula 6 that x represents a_ score. Do 

not confuse this with X, which represents a raw score. 


deviation 


2. FORMULA 8. The expression l^fx^ means that in a frequency dis¬ 
tribution, first you square the deviation score value; then multiply 
it by the / of the score value; then_all of these products. 


sum 

3, FORMULA 8. Be sure that you square the deviation score value 
before you multiply by f and sum the products. Do iSTOT sum them 
first and then square the sum. After determining , you divide 
it by_(symbol) and take the square root of the quotient. 


N 


Plate 49 





X 

f 

fX 

X 

X^ 


12 

1 

12 

3 

9 

9 

11 

2 

22 

2 

4 

8 

10 

4 

.40 

1 

1 

4 

9 

5 

45 

0 

0 

0 

8 

2 

16 

-1 

1 

2 

7 

3 

21 

-2 

4 

12 

6 

1 

6 

-3 

9 

9 

N= IB 

•LfX = 162 


Xfx^ 

= 44 




J N 

4:4: 



4. Plate 49 presents a frequency distribution, with the calculation of 
its standard deviation. For score value 12 the deviation score {x) 

is 3. To obtain x^ you square the value of 3, which equals_. The 

value of the deviation score squared appears in the x^ column. 


9 
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5. PLATE 49. For score value 12, x = 3 and = 9. The/for this 

score value is_; therefore, to obtain fx^ you must multiply 9 times 

_, which equals_. This value appears in thefx^ column of the 

plate. 


1, 1, 9 


6, PLATE 49. For score value 7, at = -2. To obtain you must square 

-2, which equals_. Remember, when you multiply a negative 

number times a negative number, you obtain a positive product. 


4 


?• PLATE 49. For score value 7, a; = -2 and x^ = 4. The/for this 

A 

score value is 3; therefore, to obtain /a; you must multiply_times 

_, which equals_. This value appears in the fx^ column of the 

plate. 


3, 4, 12 


8 , PLATE 49. S/a:^ is determined by 
in thefx^ column. 


all of the numbers 


summing 


9. PLATE 49. Using Formula 8, the cr is calculated by substituting 
the numerical values for the symbols and doing the necessary arith¬ 
metic. For this frequency distribution, (t =_. 


1.56 


10. PLATE 49. M=9, (7 = 1.56, iST = 18. 

If the scores in this distribution were perfectly normally distributed, 
the score value at 1 (7 would be 9 + 1.56 = 10.56. The score value at 
2 (7 would be 9 -}- 1.56 + 1,56 = 12.12. The score value at 3 C7 would 
be 


13.68 (9 + 1.56 + 1.56 + 1.56) 
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11. PLATE 49. M=9, cr = 1.56, JV = 18. 

Determine the values of -Itr, -2a, and -Scr. 


7.44, 5.88, 4.32 


Plate 50 


X 

/ 





21 

1 





20 

2 





19 

3 





18 

5 





17 

3 





16 

2 





15 

1 



12. Determine the M and cr of the frequency distribution. Use Plate 49 

as a guide. 





Plate 51 






X 

/ 

fX 

X 



21 

1 

21 

3 

9 

9 

20 

2 

40 

2 

4 

8 

19 

3 

57 

1 

1 

3 

18 

5 

90 

0 

0 

0 

17 

3 

51 

-1 

1 

3 

16 

2 

32 

-2 

4 

8 

15 

1 

15 

-3 

9 

9 


II 

l.fX = 306 


= 

40 



M “ 17 ~ 18 

a =yj^ = 

1.53 


13. PLATE 51. M = 18, cr = 1.53. 



The value of 

-3cr is 

and the value of 3 a- is 

. 



% of the scores lie between these score values 




13.41, 

22.59, 99.78 
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14, When there is a large N and a lot of score values involved in a dis¬ 
tribution, it is very laborious to compute the o-by the deviation 
score method. Another method of computing the cr isthe raw score 
method. This method does not use deviation scores. Instead, it 
uses the_scores in its computation. 


raw 


16 , Formula 9 presents the formula for the computation of cr by the raw 
score method. In this formula, the symbol indicates that you 

must first square the_value; then multiply by its 

/; then_all of the fX^ values. 


raw score, sum 


16, FORMULA 9. After obtaining 2/X^you divide it by_(symbol) and 

then subtract the square of the_(symbol). The last step is to take 

the square root of the figure obtained. 




17, Plate 52 presents a frequency distribution and the calculation of c 
by the raw score method using Formula 9. Notice that each score 
value is squared before it is multiplied by /, and then these fX^ 
values are_to obtain 

Plate 52 


X 

f 

fx 


X^ 


fx^ 

12 

1 

12 


144 


144 

11 

2 

22 


121 


242 

10 

4 

40 


100 


400 

9 

5 

45 


81 


405 

8 

2 

16 

u.. 

64 


128 

7 

3 

21 


49 


147 

6 

1 

6 


36 


36 


AT = 18 2)/X 

= 162 




= 1502 


162 


/l502 





M =^= 9 


1 18 

- (9)^ 

= 1.56 



summed 
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18, PLATE 52. Each/ is multiplied by the square of the score value, 
and these are summed to provide 2/X^. This figure is divided by 

_(symbol). From this dividend, the square of the_(symbol) is 

subtracted. The square root of this figure is the standard deviation. 


AT, M 


19. Plates 49 and 52 have the same frequency distribution. The (t ob¬ 
tained in Plate 52 by the_score method is exactly the same 

as that obtained in Plate 49 by the_score method. 


raw, deviation 


20. PLATES 49 and 52. M = 9, o-=1.56, AT = 18. 

The score value at lor is_. _ % of the people 

received scores between M and Icr. How many people received 
scores between /i and Icr? 


10.56, 34.13, 6.14 


21. PLATE 50. Determine the M and cr of the frequency distribution. 
Use Formula 9 for the calculation of cr by the raw score method. 
Use Plate 52 as a guide. 


Plate 53 


X 

/ 

fx 



fX^ 

21 

1 

21 


441 

441 

20 

2 

40 


400 

800 

19 

3 

57 


361 

1083 

18 

5 

90 


324 

1620 

17 

3 

51 


289 

867 

16 

2 

32 


256 

512 

15 

1 

15 


225 

225 


AT = 17 

S/X = 306 



2/x2= 5548 


306 

M - 17 

11 

00 

q 

il 

/5548 

1 17 

- ( 18)2 

= 1.53 
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22. The standard deviation (or) is the standard unit for describing the 
deviations of a set of raw scores from the_(symbol). 




23. Many times it is desirable to describe a deviation score (at) in terms 
of how many standard deviations (c) it represents. If, for raw score 
12, the deviation score is 10, this means that the raw score lies 
_points above the . 


10 


24. If the raw score lies 10 points above the M, then x = 10, If the cr of 
the distribution is 5, this raw score is_ cr above the /it. 


2 


25. Similarly, if, for a particular score, a; = 12 and (J = 4, then you know 
that the raw score lies 12/4 or_above the M. 


3C7- 


26, Any raw score can be described in terms of how much it deviates 

from the in standard deviation units, by dividing its x by the_ 

(symbol) of the distribution. 


a 


27, When you divide a deviation score by the standard deviation of the 
distribution, the dividend is called a standard normal deviate, and 
it is represented by the symbol z . Write the formula: 
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28. Formula 10 gives the formula for z. This is called a standard 
normal deviate because it is a standardized method of measuring 
the of a raw score in a normal distribution. 


deviation 


29. The numerical value of a standard normal deviate {z) is called a 
ygr-score. Thus, if, for a particular raw score, z = 1.7, this is re¬ 
ferred to as ^-score 1.7. If >2 = 2.1, this is referred to as_ 


j 2 -score 2.1 


80, To use Formula 10 to determine the ; 2 -score for a particular X, you 
must first subtract the M from the X to determine_(symbol). 


X 


31, You are given the following information about a frequency distri¬ 
bution: jLt = 50, (T = 10. To determine the . 2 -score for raw score 
75, first determine x by using Formula x = _. 


25 


(75 - 50) 


32. Same distribution: jU = 50, cr = 10. For raw score 75, x - 25, To 
determine the ^-score for this raw score, substitute the numerical 
values for the symbols in Formula 10. z = _. 


2.5 (25/10) 


83. Same distribution: /i = 50, cr = 10. For raw score 75, the ^-score 

is 2.5. This means that raw score lies_standard deviations 

above the mean, or at a point designated as_o;. 


2.5, 2.5 
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34. Same distribution: /x = 50, a* = 10. For raw score 75 the ; 2 :-score 
is 2.5, which is a positive number because, for this raw score x 
was positive. If, for a given raw score, x is negative, then its z- 
score will also be 


negative 


36. Same distribution: M = 50, u = 10. Determine the x-- and the 2 :-score 
for the raw score 34. x = _, z =_. 


-16 (34 - 50) 

-1.6 (-16/10) 


38. Same distribution: M = 50, cr = 10. For X = 75, z -2.5. 

The position of ^-score 2.5 is shown graphically in Plate 54, page 107. 
If you know that the ;^-score is 2.5, this is the same as saying that 
the raw score lies at a point_cr above the 


2.5 


37. From Plate 47 you know that_ % of the scores lie between 

the jJi and 2cr, which is the same as the percentage of scores that 
lies between fx and ;e-score 2. However, you cannot determine from 
Plate 47 what percentage of the scores lies between the M and > 2 :- 
score 2.5. 


47.73 


38. For computational purposes, it is more convenient to express a per¬ 
centage in terms of a proportion, which is the decimal equivalent of 
a percentage. For example: 34,13% is expressed as .3413, 50% is 
expressed as .5, 49.38% is expressed as __. 


,4938 
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39. ,65 is the decimal form of 65%. .5167 is the decimal form of 51.67%. 
.7216 is the decimal form of_%. 


72.16 


40. Table 1 presents the areas of the standard normal curve that cor¬ 
respond to different values of ; 2 :-scores. The 1st column of the 
table presents the different values of jg-scores, and the 2nd column 
presents the proportion of the total area under the curve that lies 
between the M and the ^r-scores. The proportion for ^-score 2.5 is 


.4938 


Plate 54 

/jL = 50 



41. The area between the M and ;^-score 2.5 is shaded. From Table 1, 
this represents .4938 of the area under the normal curve. Because 
the area is equivalent to the frequency of the scores in the distri¬ 
bution, it follows that_of N lie between the M and ;2:“SCore 

2.5. 


.4938 
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42. PLATE 54. AT = 300. 

If .4938 of the total number of scores (N) lie between the M and z- 
score 2.5, how many scores does this include? 


.4938Ar= .4938(300) = 148.14 


43. Recall that, in a normal distribution, .5 of the scores lie above the ju. 
PLATE 54. If .4938JV lie between the ju and ;2:-score 2.5, the propor¬ 
tion of scores that lies above ; 2 r-score 2.5 is 


.0062 (.5000 - .4938) 


44. PLATE 54. If ,49381^ lie between the M and >2-score 2.5, the propor¬ 
tion of N that lies below ;8f-score 2.5 is 


.9938 (.5000 + .4938) 


45. Table 1 presents the proportions of scores between the M and positive 
; 2 :-scores, which represent proportions of N lying above the ju. Be¬ 
cause the normal curve is symmetrical, the proportions are the 
same for negative ;2-scores, except that they represent proportions 
of N lying_(above/below) the fJi, 


below 



^ = 1.6 

46. For raw score 34, the ^-score is -1.6, as determined previously. 
From Table 1, determine the proportion of N that lies between the 
M and ; 2 :-score -1.6. 


.4452 
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47. PLATE 55. Table 1 indicates that .4452JSr lie between the /x and z- 
score -1.6. This is represented by the shaded area in the plate. 

Because the ^-score is negative, this proportion lies_ 

(above/below) the M. 


below 


48, PLATE 55. You have determined that between score 34 and the ju 

lie .4452i\r. In this plate, .4452iV =_. This means that 

there are_raw scores lying between raw scores 34 

and 


133.56, 133.56, 50 


49, A problem is presented in Plate 56. To determine how many 
students received English scores above 89.9, you must first deter¬ 
mine X for English score 89.9. Use Formula 6. x =_. 


Plate 56 


A group of sixty students are given an English examination. The pt score 
for the group is English score 80. The cr of the English scores is 6, How 
many students received English scores above 89.9? 


, ip 


9.9 T 


60, PLATE 56. For English score 89.9, a: = 9.9. This tells you that the 
English score 89.9 deviates 9.9 points from the At. It also tells you 
that it is_(above/below) the At. 


above 


109 







61, PLATE 56, You know that English score 89.9 deviates 9.9 points 

above the m because the deviation score (a;) is_ 

(positive/ negative). 


positive 


52. PLATE 56. You know that a: = 9.9, Next, you must determine the 
standard normal deviate which corresponds to the deviation score 
of 9,9. Use Formula 10 to determine the ;?~score. z - 


1.65 (9.9/6) 


63, PLATE 56, x = 9.9. This corresponds to ; 2 :-score 1.65. From Table 
1, the proportion of students having scores falling between the Ijl and 
;8:-score 1.65 is 


.4505 


64, PLATE 56. x = 9.9, ;3:-score 1.65. 

The proportion of students having scores between the M and English 
score 89.9 is ,4505. The proportion of students having English 

scores above English score 89.9 is_. Recall that .5JV lie 

above the M. 


.0495 


(.5000 - .4505) 


66, PLATE 56. The proportion of students having English scores above 
score 89.9 is .0495. The number of students that received scores 
above 89.9 is 


2.97 


(.0495 X 60) 
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66. Seventy “three soldiers in a platoon were each rated on their shooting 
ability. The jU shooting score for the platoon was 25, and the cr was 
8. How many soldiers received scores below shooting score 14.6? 


Plate 57 



Given: fjL = 25, a = 8, N = 73. How many lie below 14,6? 

X = 14.6 - 25 = “10.4 (Formula 6) 

“10 4 

z = ——:— = -1.3 (Formula 10) 

8 

For Z“Score -1,3, .4032 N have shooting scores between the ju and 14.6 
(Table 1) . Since .5000N lie below the ,5000N “.4032N = .0968N 
have shooting scores below 14.6. N =73. .0968N= .0968(73) = 7.0664 
soldiers have shooting scores below 14.6. 


67. To determine how many scores lie above or below a certain score 
in a set of data, you need to have three measures: the number of 
scores (N); the mean (ju); and the__ 

CJ. 


standard deviation (or) 
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68, To say that an individual has a music score of 15 tells you nothing 
about his music ability. To determine his ability relative to the 

other musicians in the group, you must know the_(symbol), the 

_(symbol), and the __ (symbol). 


iV, ju» O' (any order) 


Plate 58 

George receives the same raw score in two subjects: a music score of 15 
and a geometry score of 15. N = 50. 

Music scores ju = 20, cr = 5 

Geometry scores jii = 11/ a = 2 

69. George’s music ; 2 r-score is_. George’s geometry ;^-score is_ 


- 1 , 2 


60. PLATE 58. George’s 2:-score in music is -1. The proportion of 

students receiving music scores above George is_. (Use 

Table 1.) George’s ;s:“Score in geometry is 2. The proportion of 
students receiving geometry scores above George is_. 


.8413 (.5000 + .3413) 

.0228 (.5000 - .4772) 


61, PLATE 58. .8413iV’ received music scores above George. .0228^" 
received geometry scores above George. How many students re¬ 
ceived music scores above George? How many students received 
geometry scores above George? 


42.06 (.8413A^ = .8413 x 50) 

1.14 (.0228Ar= .0228 x 50) 


62. PLATE 58. In a class of 50 students, 42.06 students received higher 
music scores than George and 1,14 students received higher geom¬ 
etry scores than George. Conclusion: George is much better in 
than he is in 


geometry, music 
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EXERCISES 


1. (a) Use Formula 8 to calculate the a* of the following frequency 

distribution by the deviation score method. 

X / 

10 1 

9 2 

8 3 

7 4 

6 5 

5 4 

4 3 

3 2 

2 1 

(b) Use Formula 9 to calculate the cr of the following frequency 
distribution by the raw score method. 

X f 

20 1 

19 4 

18 6 

17 3 

16 2 

15 1 

2. What is meant by the term standard normal deviate? What is its 
symbol ? 

3. Determine the ; 2 :-score for score 9 in Exercise 1(a); for score 16 in 
Exercise 1(b). 

4. Using Table 1, determine the proportion of scores between score 
values 5 and 8 in Exercise 1(a). What proportion lies between score 
values 16 and 18 in Exercise 1(b)? 

5. What proportion of scores lies above score value 9 in Exercise 1(a)? 
What proportion lies below score value 17 in Exercise 1(b)? 
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SPECIFIC OBJECTIVES FOR SET 8 


At the conclusion of this set you will be able to: 

(1) express probability values in decimal form. 

(2) use Table 1 to determine the probability of selecting a raw score 
which lies between the mean and any given ^:-score. 

(3) state the definition of sample and population, 

(4) define the terms parameter and statistic. 

(5) identify symbols which represent population parameters and 
sample statistics, 

(6) identify the symbols of P, X^ and s. 
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1, If you toss a coin a number of times, the chances are that it will 
turn up heads one-half, or 50%, of the time. That is, the probability 
of heads occurring is_ 


50 


2. In statistics, the symbol for probability is P, and the percentage is 
e3q)ressed in decimal form. The probability that a coin will turn up 
heads may be expressed as P = .5, which means that the probability 
is_ % that it will turn up heads. 


50 


3. If the probability of occurrence is 25%, this would be expressed as 
P = .25. If the probability of occurrence is 75%, this would be ex¬ 
pressed as P =_. 


.75 


4, If you determined the jU of a set of normally distributed scores, then 
wrote each score on a slip of paper, placed them in a hat, and then 
drew one of them out, the probability that the score selected would 

lie above the M may be expressed as P =_because_% of 

the scores lie above the M in a normal distribution. 


.50, 50 


6. The probability of selecting a score above the M is .5. The prob¬ 
ability of selecting a score above may be expressed as P =_, 

because_% of the scores in the distribution lie above . 


.75, 75 
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6, Table 1 can be interpreted as indicating the probability of selecting 
a score between the M and the various ^-scores. Thus, the probability 
of selecting a score between the ju and Icr may be expressed as 
P = .3413. What is the probability of selecting a score between the/x 
and l.Go-? P = 


.4452 


7. TABLE 1. The probability of selecting a score that lies between the 
jLt and 1.6a- is .4452. This means that, out of 100 selections, 44.52 
of them would probably have scores that lie between the M and 1.6a-, 
How many out of 100 selections would probably have scores between 
the fJi and 1.25a-? 


39.44 


8. TABLE 1. The probability of selecting a score that lies beyond 3a- 
is P = 


.0013 (.5000 - .4987) 


9. TABLE 1. The probability of selecting a score that lies beyond -2.1a- 
is P ^ 


.0179 


(.5000 - .4821) 


10. TABLE 1, The probability of selecting a score that lies between 
-2a- and 2o- is P = 


.9544 (.4772 + .4772) 


11, The probability of selecting a score that lies between - 20 “ and 2a- is 
.9544. How many out of 100 selections would probably have scores 
that lie between these two points ? 


95.44 
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Plate 59 


You are given the following information about a set of scores that are 
normally distributed: jit = 50, (7=5, 

(a) What is the probability that a score selected at random from this 
distribution is 57 or more? 

(b) What is the probability that a score selected at random is between 
40 and 55? 

12. PLATE 59(a). To solve this problem, you must first determine the 
^-score for the raw score of 57. Use Formulas 6 and 10. 
z = 


1.4 (jr = 57 - 50; 7/5 = 1.4) 


13. PLATE 59(a). For raw score 57, at = 7 and z =1.4. 
TABLE 1, Between ju and ^-score 1.4, P =_ 


.4192 


14. PLATE 59(a). For raw score 57, z =1.4. Between M and ^-score 
1.4, P = .4192. You know that above the fJi, P - ,5. Therefore, the 
probability of obtaining a score of 57 or more is P =_. 


.0808 


(.5000 - .4192) 


16, PLATE 59(a). The probability that a randomly selected score is 57 
or more is .0808. This means that if you continued selecting scores 

at random,_ % of the time you would select scores that are 

57 or more. 


8.08 


16, PLATE 59(b), To solve this problem you must first determine the 

. 2 :-score for each of the raw scores. For raw score 40, z = _. 

For raw score 55, z =_. 


- 2 , 1 
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17, PLATE 59(b). For raw score 40, z = -2. For raw score 55, z = 1. 

TABLE 1. The probability of selecting a score between the M and 

;e-score -2 is_. The probability of selecting a score 

between the M and ;Sf-score 1 is_. Therefoi^e, the prob¬ 
ability of selecting a score between 40 and 55 is__. 


.4772, .3413, .8185 


18. PLATE 59(b). The probability that a randomly selected score is 
between score 40 and 55 is .8185. This means that if you continued 

selecting scores at random,_ % of the time you would 

select scores that lie between 40 and 55. 


81.85 


19. When you measure the height of all of the seventh-grade children 
in New York City, you have measured the population of seventh- 
grade children in New York City. If you measure only fifty of these 
children, you have measured a sample of the__. 


population 


20. A population is defined by its descriptive terms. That is, all girls 
in high school English classes can be termed a population. All Boy 
Scouts in Troop 264, or all eight-day-old rats in a laboratory, can 
be termed a 


population 


21 . A sample is any portion of the population less than the total population 
from which it is selected. Thus, if there are one hundred elementary 

school teachers in a population, fifteen of these would be a_ 

of the population. 


sample 
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22 • When you have measured all of the people or objects of a certain 
description, you have measured the population. When you have 
measured a selected group from the population, you have measured 
a_of the population. 


sample 


23. Usually it is not possible to obtain measurements on every individ¬ 
ual or object in the population. Therefore, you select a_ 

and make the assumption that this_is representative 

of the population from which it is selected. 


sample, sample 


24, It would be almost impossible to measure the height of every 
seventh-grade child in New York City, but you can measure the 
height of a sample of seventh-grade children and make inference 
about the height of this_from the measure¬ 

ments obtained on the sample. 


population 


26. If you wish to make inferences about the height of the population of 
seventh-grade children in New York City from measurements you 
obtain on a sample of that population, the sample must be represen¬ 
tative of the 



population 


26, A sample selected to represent as nearly as possible the population 
from which it is selected is called a representative sample. The 

most common method of obtaining a sample that is_ 

of the population is by selecting it randomly. 


representative 
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27* When a sample is selected from a_by some 

random method, it is called a random sample. 


population 


28, If you place the names of all seventh-grade children in New York City 

in a barrel, and draw out fifty names, you will have a __ 

sample of this population. 


random 


29, You may wish to know the mean height of the population of seventh- 
grade children in New York City, but you cannot measure the height 
of all children in this population. Instead, you can obtain the mean 
height of a_of this population. 


sample 


30, From the mean score of a sample, you can make an inference about 

the mean score of the_of which the sample is 

representative. 


population 


31, If you know the standard deviation of scores for a sample, you can 

make an inference about the__ 

of the scores in the population. 


standard deviation 


32, If your sample is a random sample of the seventh-grade children 
in New York City, you may make inferences from this sample to 

the_of seventh-grade children in New York 

City. 


population 
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33 • You may not make inferences from this sample to all seventh-grade 
children in the United States, because your sample is not represen¬ 
tative of this 


population 


34. Your sample only permits you to make inferences about the popula¬ 
tion from which it is 


selected 


36. The mean score for a sample is called a sample mean. The mean 
score for a population is called a_. 


population mean 


36. You already know the symbol for a population mean. It is 
(symbol). The symbol for a sample mean is X, 




37. When you refer to the mean of a population, you use_(symbol). 

When you refer to the mean of a sample, you use_(symbol). 




38. You already know the symbol for the standard deviation of a popula¬ 
tion. It is_(symbol). The symbol for the standard deviation of a 

sample is s. 


a 
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Greek letters are used as symbols for the characteristics of popula¬ 
tions. Roman letters are used as symbols for the characteristics 
of 


samples 


40, The symbol for a population mean is_(symbol). The symbol for 

the standard deviation of a population is_(symbol). 


O' 


41, The symbol for a sample mean is_(symbol). The symbol for the 

standard deviation of a sample is_(symbol). 


X, s 


42 . Characteristics of populations are represented by_ ^ (Greek/ 

Roman) letters. Characteristics of samples are represented by 
_(Greek/Roman) letters. 


Greek, Roman 


43, A characteristic of a population, such as its M or cr, is called a 
parameter. The fJ. height of all seventh-grade children in New York 
City is called a_. 


parameter 


44, A parameter is a characteristic of a population. A statistic is a 
characteristic of a sample. The pt of a population is called a 
_. The X of a sample is called a_. 


parameter, statistic 
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45. Memory hint: The words "population^' and "parameter" both begin 

with the letter "_The words "sample" and "statistic" both begin 

with the letter " ", 


p, s 


46. The cr of the height of the population of seventh-grade children in New 
York City is called a_(parameter/statistic). 


parameter 


47. The s of a sample is called a 
statistic). 


(parameter/ 


statistic 


48. The mean of a sample is written _ (symbol) and is called a 

_(parameter/statistic). The mean of a popula¬ 
tion is written_(symbol) and is called a_ 

(parameter/statistic). 


X, statistic, M, parameter 
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EXERCISES 


1. What does the symbol P represent? 

2. In a certain arithmetic test there is a probability that one-half of 
the students will score above 82. How is this probability expressed 
in decimal form ? 

3. If there is a 15% chance of getting a grade of A in a certain English 
course, how can this probability be expressed in decimal form? 

4. 95% of the population own television sets. What is the probability, 
expressed in decimal form, of randomly selecting a person from 
this population who does not own a television set? 

5. Using Table 1, determine the probability of obtaining a score between: 

(a) jU and Icr 

(b) jLt and 2 . 130 - 

(c) -.64cr and 2.05O- 

(d) 1.750* and 2.25O' 

6. What is meant by the terms population and sample ? 

7. What term is used to describe characteristics of a population? Of a 
sample? 

8. What type of symbols are used to represent population character¬ 
istics? Of sample characteristics? 

9. What do the symbols fx , cr, X, and s represent? 
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set 9 

SAMPLING ERROR 


When a sample is drawn from a population, it is unlikely that the 
statistics derived from the sample are identical to the population param¬ 
eters. There is always some error involved when a sample is selected 
from a population. This set will discuss the concept of sampling error, 
and the new terms, sampling distribution of means and standard error 
of the mean, will be introduced. 

Interpreting the distribution of sample means allows us to determine 
probabilities associated with various sample mean values in relation to 
the population mean. A method for deriving these probabilities will be 
presented, as well as the relationship of the size of the sample to the 
standard error of the mean. 

SPECIFIC OBJECTIVES OF SET 9 

At the conclusion of this set you will be able to: 

(1) define the standard error of the mean. 



(2) determine from Table 1 the probability of selecting any given 
sample mean in a sampling distribution. 

(3) state the relationship of the number of scores in a sample to 
the size of the standard error of the mean. 

(4) identify the symbol cr —. 

X 
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1, Suppose, from one population, you select two different samples of 
equal size. Because it is difficult to obtain a sampl^that is exactly 
representative of a population, it is likely that the X's of these two 
samples_^ (will/will not) vary. 


will 


2. You may expect that the X's of two samples, drawn from the same 
population, will differ. This is due to sampling error, because it is 

difficult to obtain a sample perfectly_ 

of a population. 


representative 


3, The X’s of samples drawn from the same population are seldom 
exactly the same. This is due to sampling_. 

error 


4. If you obtain many samples from a population, it is likely that the 
_’s (symbol) of the samples will vary. 


X 


5> This variability among the X^s of many samples drawn from the 
same population is due to_. 

sampling error 
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6 . If you draw many samples from a population, the_(symbol) of 

these samples will_due to sampling error. 


X, vary 


Plate 60 



Meons for Twenty- 

■Six Samples, Each 


Containing N^50 Scores 



4 6 

6 3 


5 7 

4 5 


3 4 

5 4 


1 4 

3 3 


3 2 

4 4 


5 4 

5 2 

6 2 

7. This plate 

presents the_ 

_*s (symbol) of a large number of 


selected from the same population. 



samples 

8. PLATE 60. 

Each number in this plate represents the_^s (symbol) 

of one 




X, 

sample 


9. NOTE: The principle to be illustrated only holds true "if there is a 
very large number of samples involved. For ease of presentation, 
Plate 60 presents the X's of twenty-six samples. The assumption 
should be made, however, that there is a very large number of 
samples involved in this illustration. 


go to next frame 











10. PLATE 60. Each sample whose X is presented contains 
(number) scores. 


50 


11. A frequency distribution of a set of sample X’s is called a sampling 
distribution ofX^s. Prepare the sampling distribution of the X^s 
presented in Plate 60. 


Plate 61 

Sampling Distribution of X*s 

Frequency distribution of the X‘s of 
twenty-six samples selected from the 
same population. N = 50 for each 
sample. 

X / 

7 1 

6 3 

5 5 

4 8 

3 5 

2 3 

1 1 

X=4 0- = 1.4 

A 


12. PLATE 61. This frequency distribution is called a_ 

distribution of_’s (s 5 mibol) because each / in the distribution is the 

mean of a sample. 


sampling, X 
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18. If you prepare a frequency distribution of a large number of sample 
X's, you find that it is in the shape of a normal curve. Thus, the 

sampling_of JCs is in the form of a normal 

distribution. (This is true only when each sample contains a large 
number of raw scores.) 


distribution 


14. PLATE 61. Th^variability among the X^s presented in this sampling 
distribution of X’s is due to sampling_ 


error 


16. Wlmn you compute the standard deviation of_a sampling distribution 
of X's, it is called the standard error of theX because it is a stand¬ 
ard measure of the_involved in obtaining a sample that 

is perfectly representative of the population. 


error 


18. The symbol for the standard error of theX is written cr-. The sym¬ 
bol O’ indicates that the standard error is the standard 
of a set of sample means. 


deviation 


17, PLATE 61. The standard error of the mean in this example is 


1.4 
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18, PLAT]^61. O'J = 1-4. This indicates that the standard 
of the X for this set of sample X^s is 1.4. 


error 


19. The standard error of the X is interpreted in exactly the same 
manner as the standard_of raw scores. 


deviation 


20. When you have a large number of sample X^s, the best estimate that 

you can make regarding the population ju is to determine the_ 

of the X’s. 


mean 


21, The X of the sample X’s is the best estimate you have of the 
_(symbol). 


population jLi 


22. The X of the sample X*s is the best estimate you have of the value 
of the population Thus, in Plate 61, the best estimate that you can 
make regarding the population jit is that its value is_. 


4 


23. PLATE 61, The X of this sampling distribution of X’s is 4. There¬ 
fore, 4 is the best estimate of the_(symbol) 

because it is the most representative of all the sample X^s. 


population 
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24. PLATE 61. Assume that the value of the population p is, in fact, 4, 
The standard error of the X in this plate is_. 


1.4 


26. PLATE 61. 1,4. This indicates that the standard error of the 

X is 1.4 score points. Recall that the standard error of the X is 

interpreted exactly the same as the standard . _ 

of raw scores. 


deviation 


26. The percentage of scores that lies between the X and lu is_ %. 


34.13 


27. Between the X and la lie 34.13% of the scores. In interpreting the 
standard error of the X, the percentage of sample X's that lies be¬ 
tween the population ju and 1 standard error is also_ %, 


34,13 


28. PLATE 61. The value of the sample X that lies one standard error 
above the M is_. 


5.4 


29. PLATE 61. The percentage of sample X^s that lies between values 
4 and 5.4 is %. 


34.13 
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30. Recall that the percentages can be expressed as proportions. Thus, 
50% is expressed as .5000. 34.13% is expressed as_. 


.3413 


31. PLATE 61. Thus, the proportion of the samples that have X^s be¬ 
tween 4 and 5.4 is 


.3413 


32. Recall that proportions may also be expressed as probabilities. 
Thus, in Plate 61, if 50%, or .5, of the X^s lie above the ju, it can be 
said that the probability of obtaining a sample whose X is above the 
population [jl is_. 


.5 


33. The probability of obtaining a X that lies above the jit is P = .5000. 
The probability of obtaining a X that lies between the ju and lo-j is 


.3413 


34. PLATE 61. M = 4, (7^= 1.4. 

Using Formula 10, determine the ^-score of a value of 6.1. 


z 


2.1 

1,4 


1.5 


35. PLATE 61. For value 6.1, z = 1,5. 

Using Table 1, the probability of obtaining a sample X that is between 
the ju and 6.1 is P =_. 


.4332 
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36. PLATE 61. The probability of a sample X lying between the p 

6.1 is P = .4332. This means that_ % of the sample X^s 

lie between the p and 6.1. 


43.32 


37. PLATE 61. To determine the probability of obtaining a sample 
X that is between score value 2.1 and ju, first compute the z score. 
^ = _. P =' _. 


-1.36 (X =2.1 -4: = -1.9) 

.4131 {z = -1.9/1.4 = -1.36) 


38. The probability of obtaining a sample X th^is between 2.1 and fx is 
P = .4131. The probability of obtaining aX that lies between_the p 
and 6.1 is P = .4332. Therefore, the probability of obtaining a Xthat 
lies between 2,1 and 6.1 is _ . 


.8463 


39. PLATE 61. Determine the probability of obtaining a sample whose 
X lies between 2.6 and 6.8. 


Plate 62 

2.6 - 4 

== -1.4 

-1.4 

^ = 1.4 = 

From Table 1 

P = .3413 


6.8 - 4 

= 2.8 

2.8 

^ 1.4 ^ 

P = .4772 


P = .3413 + .4772 = .8185 
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40. PLATE 62. In a sampling distribution of X’s in which the ^ is 4 
and the is 1,4, the probability of obtaining a sample whose X lies 
between 2.6 and 6.8 is_. 


.8185 


41. If you know the ju and the_of the X, you 

can determine the probability of a sample mean lying between any 
two selected score values. 


standard error 


42. PLATE 61. Each of the twenty-six samples whose means are pre¬ 
sented contains_raw scores. 


50 


43. If the number of scores within each of the samples is increased, it 

follows that there is_(more/less) error involved in the 

sampling. 


less 


44. There is less error involved in a large sample than in a small 
sample. Therefore, in a sampling distribution of X^s where each 

sample contains thirty scores, the a^will be_(less/ 

greater) than where each sample contains fifty scores. 


greater 
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Plate 63 





46* Here are presented three sampling distributions of X^s, Each dis¬ 
tribution contains the X^s of_samples. 


100 


46. PLATE 63. In the top sampling distribution, where each sample 
contains thirty scores, the ~ 3; whereas, in the second distri¬ 
bution, where each sample contains fifty scores, the _• 


2 


137 



47, PLATE 63. In the bottom sampling distribution of X's where each 
sample contains seventy-five scores, the c=_. 


1 


48. PLATE 63. From these three sampling distributions of X's, it can 
be seen that as the sample size is increased, the _. 


decreased 


49. PLATE 63. As the number of scores in each sample^is increased, 
there is_(more/less) error in the sample X’s. 


less 


60. PLATE 63. Here it can be seen that the larger the sample size, the 

_(more/less) confidence you have that any particular sample 

X approximates the ju. 


more 
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EXERCISES 


1. What is meant by the term standard error of the mean ? What is its 
symbol ? 

2. Using Table 1, determine the probability of selecting a sample whose 
mean score lies 

(a) between fx and 2(jy 

(b) between -1^ Y and 2(7^ 

(c) between -2.05orYand .50-^ 

3. Assume the M of a set of data is 12 and the cr^is 3. Using Table 1, 
determine the probability that you will select a sample whose mean 
score lies 

(a) between 9 and 12 

(b) between 12 and 16 

(c) between 10 and 13 

4. If the number of scores in a sample is increased, how is the standard 
error of the mean affected? 
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set 10 

MEASURES OF VARIABILITY; variance 


The last set dealt with a method of determining the probabilities 
associated with any given sample mean when the population mean is 
known. In research, however, we are almost never fortunate enough 
to know the value of the population mean. Usually, we have only the data 
contained in one sample from which to infer values of the population 
parameters; and before we can estimate the population mean, we must 
estimate the variability in the population. 

This set presents the method for estimating the standard deviation of 
a population when you have data for only one sample of that population. 
We will define the term variance and explain its relationship to the stan¬ 
dard deviation. There is an introduction to the concept of sum of squares 
and its calculation by the raw score and deviation score methods. We 
will also distinguish between the biased estimate and the unbiased esti¬ 
mate of the population variance based on a single sample, and explain 
how the concept of degrees of freedom is used to derive unbiased esti¬ 
mates of the population variance. 
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SPECIFIC OBJECTIVES OF SET 10 


At the conclusion of this set you will be able to: 

(1) use Formula 11 to calculate the sum of squares for a frequency 
distribution by the deviation score method. 

(2) use Formula 12 to calculate the sum of squares for a frequency 
distribution by the raw score method. 

(3) state the relationship between the standard deviation and 
variance. 

(4) determine the degrees of freedom for a frequency distribution. 

(5) use Formulas 13 and 15 to calculate an unbiased estimate of 
population standard deviation and variance from a sample fre¬ 
quency distribution by the deviation score method. 

(6) use Formulas 14 and 16 to calculate an unbiased estimate of the 
population variance and standard deviation from a sample fre¬ 
quency distribution by the raw score method. 

(7) identify the symbols s^, and df. 
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1. ^ the examples that have been presented, the standard error of the 

X has been obtainec^by computing the standard_ 

of a set of sample X*s. 


deviation 


2. You seldom have a large number of samples; however, from the data 
contained in one sample you can estimate the value of the population 
_(parameters/statistics). 


parameters 


3. In order to make estimates of population_ 

(parameters/statistics) from the_(parameters/ 

statistics) of one sample, you need to learn some new terms. The 
first term is sum of squares. 


parameters, statistics 


4, FORMULAS 11 and 12. These formulas for the ’'sum of squares" 
use 2^^ instead of as used in Formula 8. The / was included 
earlier only for illustrating the computation. The two expressions 
and T.fx^ mean exactly the same thing: the sum of the squared 
scores. 


deviation 


6. FORMULAS 11 and 12. These two formulas give two methods of 

calculating the deviation score method and the_ 

method. 


raw score 






6. FORMULAS 11 and 12. These two formulas are equivalent and 
yield identical values. The raw score formula is given because it 
is sometimes easier to use in computation. The expression for 
"sum of squares" is_(symbols). 




7, FORMULAS 11 and 12. The expression, , is called the "sum 

of squares" and indicates that you_the squared deviation 

scores of the sample. 


sum 


8. FORMULAS 11 and 12. 'Lx^ is called the of 


sum, squares 


9, FORMULAS 13 THROUGH 16. The symbol for the standard deviation 
estimate is s. The symbol for the variance estimate is s^. Thus, 
the standard deviation is the_of the variance. 


square root 


10 . FORMULAS 13 THROUGH 16. These formulas introduce a new term, 

variance. The symbol for variance is_(symbol). (See Formulas 

13 and 14.) 


11 . FORMULAS 13 and 14. The Roman symbol_(symbol) is used 

for variance to indicate that it is an estimate, from sample data, of 
the population variance. 
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12 . FORMULAS 13 and 14. The symbol for the population variance is 
(j^. The symbol for the estuiiate of the population variance, from 
sample data, is_(symbol). 


s 


2 


13 . The symbol for the population standard deviation, you will recall, 
is_(symbol). The symbol for the estimate of the population stand¬ 

ard deviation, from sample data, is s. 


(j 


14 . FORMULAS 15 and 16. These are equivalent formulas for estimating 
the standard deviation of a population from sample data. The symbol 
for this estimate is_(symbol). 


s 


15. The formula for the variance of a population is: 



N 


The formula for the estimate of the population variance, from sample 
data, as presented in Formula 13, is: 

^ =_(symbol) 


N-1 


16 . « 


Formula 13, for estimating the population variance from sample 
data, differs from the above formula for the population variance, 
in that it uses N-1 as the denominator term instead of_(symbol). 


N 
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17. FORMULA 13. If you use N instead of N-1 in the formula, it would 
tend to be an underestimate of the population variance. Hence, it 
would be called a biased _ . of the population variance. 


estimate 


18. The use of JV-1 in the denominator of Formula 13 yields an unbiased 

estimate of s^. Thus, Formula 13 yields a_(more/less) 

accurate estimate of the population variance. 


more 


19. FORMULA 13. The term, JV-1, indicates that you subtract_from 

the number of scores in the sample. A discussion of the meaning 
of this term follows. 


1 


Plate 64 


Example A: 

10 

20 

30 

40 

50 

X- 30 

Example B: 

16 

22 

35 

37 

@ 

3 0 

Example C: 

5 

11 

37 

45 

o 

30 


20. PLATE 64, To illustrate the concept of degreesjof freedom, first 
compute the X of the five scores in Example A. X = 


30 


21, PLATE 6^. EXAMPLE A. Now coihpute the deviation of each score 

from the X, The five deviation scores are , " ,_,_, 

and . ' :i 


-20, -10, 0, 10, and 20 
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22. PLATE 64, EXAMPLE A. The five deviation scores are -20, -10, 0, 
10, and 20. It is evident that the sum of the deviation scores is_. 


0 


23. PLATE 64. EXAMPLE A. In order for the sum of the deviation 
scores to equal zero, if the first four score values are 10, 20, 30, and 
40 the fifth score can have only one value. It must be_. 


50 


24. PLATE 64. EXAMPLE A. If X = 30, four of the scores may be of 
any value, but the fifth score is not_to vary. 


^ free 

25. PLATE 64. EXAMPLE B. If X = 30 and four of the scores have 
values 16, 22, 35, and 37, then in order for the sum of the deviation 
scores to equal zero the fifth score must have the value_. 


40 


26. PLATE 64. EXAMPLE C. In order for the sum of the deviation 
scores to equal zero, the fifth score in this sample must be_. 


52 


27. Thus, for any set of data we may generalize that the sum of the 
_scores must equal zero. 


deviation 


28. As illustrated in the above examples, if there are five scores in a 
sample, four of these scores may be of any value, but the fifth is 
not_to vary. 


free 
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29. FORMULA 13. This formula indicates that to obtain you divide 

which is called the_of_, by iST-l, which 

is called the _ _of_. 


sum, squares, degrees, freedom 


30. When you divide by N-1, you obtain an estimate of the 
__of the scores in the population. 


variance 

-- . ■ .. . — — ^ •: ■ ■ ii ^ 

31. When you know the variance of a population, you use__ (symbol). 

When you estimate the variance of a population from data contained 
in a sample, you use_(symbol). 



32. If you have data from a sample and wish to make an estimate of the 
variance of the population from which the sample was selected, you 
use Formula 13. The symbol is used in this formula to indicate 

that the variance is computed from data contained in a __ 

(sample/population). 


sample 

33. FORMULAS 13 THROUGH 16. For statistics obtained from a sample: 

the symbol for the variance is_(symbol); the symbol for the 

standard deviation is_(symbol). 



34. FORMULAS 13 THROUGH 16. The variance is the square of the 


standard deviation 
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EXERCISES 


1. For the following frequency distributions, calculate the "sum of 
squares” by the deviation score method, using Formula 11, and by 
the raw score method, using Formula 12. Check to make certain 
that your answers are identical for the two methods. 

(a) X / {h) i f 

10 2 26-30 2 

9 3 21-25 3 

8 1 16-20 5 

7 4 11-15 4 

6 2 6-10 3 

52 1-53 

4 1 

3 1 

2. How many degrees of freedom are associated with the frequency 
distributions in Exercises 1(a) and 1(b)? 

3. What is the relationship of the standard deviation to the variance of 
a frequency distribution? 

4. Calculate the unbiased estimate of the population standard deviation 
for the distributions in Exercises 1(a) and 1(b). Use Formula 15. 

5. Calculate the unbiased estimate of the population standard deviation 
for the distributions in Exercises 1(a) and 1(b). Use the raw score 
method as presented in Formula 16. Check your answers with those 
obtained in Exercise 4. 

6. What do the symbols s^, and df represent? 
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STANDARD ERROR OF THE MEAN 
CONFIDENCE INTERVAL 


It has been pointed out earlier that if there are a large number of 
samples, the standard deviation of the sample mean can be calculated 
directly to obtain the standard error of the mean. This set will demon¬ 
strate that, from the data in one sample, we can estimate the standard 
error of the mean. We know the sample mean; what we do not know is 
the population mean. However, we can hypothesize a value for the pop¬ 
ulation mean and then determine what the probability is of obtaining a 
sample whose mean is the value of ours. 

Confidence interval is defined, and a method given for calculating it. 
The confidence interval allows the researcher to make a hypothesis 
about the interval within which a sample mean will lie, and to state the 
degree of confidence he has that he is accurate. 

SPECIFIC OBJECTIVES OF SET 11 

At the conclusion of this set you will be able to: 

(1) use Formula 17 to derive the standard error of the mean from 
data in one sample. 



(2) determine the probability that a sample mean will lie between 
any two given values when given the raw scores in one sample 
and hypothesizing a value for the population mean. 

(3) state the generally accepted cut-off level of probability for 
accepting a hypothesis. 

(4) calculate the 5% confidence interval from the data in one sample 

hypothesizing a value for the population mean. 

(5) identify the symbol . 
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1. In the earlier discussion, the standard error of the mean (crj) was 
obtained by taking the X^s of all possible samples of a certain size 
and computing their_deviation. 


standard 


2. Generally, you do not have a large number of X*s to use in the com¬ 
putation of C7J. However, with the data in one sample, you can use 
Formula 17 to estimate the standard error of the ^ 


mean 


3. FORMULA 17. The symbol for the estimate of the standard error 
of the mean, from sample data, is_(symbol). 




4. FORMULA 17. The symbol, s J, is used in this formula because it 
is an estimate of the standard error of the mean, derived from 
_(sample/population) data. 


sample 


6. The 5j is a_(statistic/parameter). 


statistic 


6 . The sj is a statistic because it is derived from the data of a 
(sample/population). 


sample 









7. In Formula 17, the symbol s represents an estimate of the standard 
deviation of_scores in one sample. 


raw 

8. In Formula 17, the symbol N represents the_of raw scores 

in the sample. 


number 

9. In Formula 17, the two statistics of a sample that you need in order 
to compute the are the_(symbol) and the_(symbol). 


s, N 

10. Formula 17 indicates that the sj is calculated by dividing_(symbol) 

by the square root of_(symbol). 

, s, iNT 

11. Consider a sample in which N = 100, X - 12, and s = 16. Set up the 
computation for s^by substituting the numerical values for the sym¬ 
bols in Formula 17. 

^ ( ) 

^ yn 



12. Sample: N-lOO, X - 12, s = 16. 

FORMULA 17. Do the necessary computation to obtain the 



1.6 
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13. Sample: N = 100, X = 12, s = 16. 

The standard error of the mean sj is 1.6. This means that the 
standard deviation of a population of sample X^s, with eachiV = 100, 
is 


1.6 


14. Sample: JV=100, X = 12, s = 16, sJ = 1.6. 

If the sample from which these statistics are derived is representa¬ 
tive of the population, we can infer population__ 

from them. 


parameters 


16. When you use statistics to make an inference regarding a pop¬ 
ulation _(parameter/statistic), from a sample 

_(parameter/statistic), it is called di statistical 

inference. 


parameter, statistic 


18. The process of using a sample statistic for making inferences re¬ 
garding a population parameter is called statistical__, 


inference 


17. You know that a sample generally is not perfectly representative of 
a population. This is due to_error. 


sampling 


18. The statistic that you have just learned which gives you a measure of 

the error involved in sampling is called the__ 

_of the_(symbol). 


standard error, X 
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19. You have learned that the probability of obtaining a sample X between 
the M and 1 s j is .3413. Thus, it is evident that you determine 
vhat the probability is of obtaining any specific sample X if you know 
the value of the_(symbol) and the value of_(symbol). 




20. However jjvhen you do not know the value of the but have only the 
value of X, you cannot determine the probability ofjhe jLt being any 
specific value. This is because, unless your__sample X happens to be 

identical with the population jLt, your sample X _(does/ 

does not) represent the midpoint of the distribution of sample X's, 


does not 


21. One method of making some statements regarding your sample X, 
using your estimated , is to choose a hypothetical value as your 
best estimate for the jLt and determine what the probability is of 
obtaining your sample_(symbol). 


X 


22. Thus, if you hypothesize a value for the population ju, and have an 
estimate of the sj, you can make probability statements regarding 
the value of any_(symbol). 


sample X 


23 . Example: Suppose you hypothesize that the population ju is 10. From 
the data in one sample, you determine that s j = 2. To determine 
the probability that a sample X lies, say, between 10 and 12, you 
must first determine the area of the normal curve that lies between 
10 and 12. If = 2, then between 10 and 12 lies_ s^. 


one 
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24. Thus, if you hypothesize a value for the population jit, and have an 
estimate of the you can make probability statements regarding 
the value of any_(symbol). 


sample X 


25. Example: Hypothesized ^ = 10, S;^=2. 

FROM TABLE 1. The probability that a sample X lies between 
10 and 12 is_. 

.3413 


26. Example: Hypothesized ju = 10,= 2. 

The probability that a sample X lies between 10 and 12 is .3413 
because the value 12 lies one_(symbol) above the hypothesized ju. 


27. Example: Hypothesized iK = 10, s^ = 2.^ 

To determine the probability that a sample X lies between, say, 8 and 

12, first determine that: 8 lies_ sj from the hypothesized ju ; 

12 lies_s j from the hypothesized ju. 


-ISj, 


28. Example: Hypothesized jli = 10, s^ = 2. Use Table 1. (Recall that 
P means ’V^’o^ability”.) 8 lies -Isj from the hypothesized ju, which 

has P =_. 12 lies 1 from the hypothesized /x, which 

has P =_ 


.3413, .3413 










29. Example: Hypothesized ju = 10, sj = 2. 

The probability that a sample X lies tetween 8 and the ijl is P = .3413. 
The probability that a sample X lies between the pt__and 12 is P = .3413. 
Therefore, the probability of obtaining a sample X which lies between 
8 and 12 is _ . 


.6826 


30. Example: Hypothesized ju = 10, sj=2. _ 

Determine the probability of obtaining a sample X between 6 and 11. 

6 lies_from the hypothesized jit. 11 lies_from the 

hypothesized ju . 


-2sx, .5S;f 


31. Example: H 3 ^othesized jii = 10, s^ = 2. 

Use Table 1. 6 lies -2 from the hypothesized ]li, which has a 

P =_. 11 lies .5 s J from the hypothesized jLt, which has 

a P = 



.4772, .1915 


32. Example: Hypothesized ]li = 10, sj = 2. 

FROM TABLE 1. The probability that a sample X lies between_6 and 
the hypothesized ju is P = .4772. The probability that a sample X lies 
between the hypothesized ju and 11 is P = .1915. Therefore, the 
probability of obtaining a sample X between 6 and 11 is_. 


.6687 


33. Using Table 1, you can determine what the probability is that a 
sample X lies between any two values, if you hypothesize a value 
for_(symbol) and have an estimate of the value of_(symbol). 
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34. Conversely, you can select a probability value and then calculate the 
two score values between which the sample_(symbol) lies. 


X 


36. Example: Hypothesized ju == 10, = 2. 

Take the probability value P = .8664. Because a sample X has equal 
probability of lying either above or below the hypothesized , you 
must first divide the P value by_. 


2 


36. Example: Hypothesized /x = 10, 2. 

For P = .8664, the probability that the sample X lies above the 

hypothesized is P =_. Also, the probability that the 

sample X lies below the hypothesized ju is P =_. 


.4332, .4332 


37. Example: Hypothesized ji = 10,_Sj = 2. 

To determine how far a sample X deviates from the hypothesized 
M at the probability level of P = .4332, use Table 1 in reverse. Locate 
.4332 in the second column of the table and notice that the sx associ¬ 
ated with it is 


1.50 


38. Example: Hypothesized jU = 10, = 2. 

From Table 1, you determined that 1.5 sj was associated with 
P “ .4332, For this example, if the hypothesized ju = 10 and sj = 2, 
then the value at -1.5 sj is_, and 1.5 sj is_. 


7, 13 
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39, Example: Hypothesized jit = 10, sj = 2. 

For P = ,8664: Below the hypothesized [jl, P= .4332 is -1.5 s which 
is 7. Above the hypothesized ju, P = .4332 is 1.5 which is 13. 

Therefore, the probability that the sample X lies between_and_ 

is P = .8664. 


7, 13 


40. Example: Hypothesized ju = 10, = 2. 

The probability that a sample X lies between 7 and 13 is P = .8664. 
7 and 13 indicate what is called a confidence interval because you 
can state with confidence that in 86.64% of the instances where the 
JU = 10 and = 2, the_(symbol) lies between 7 and 13. 


X 


41. Example: Hypothesized ju == 10, sj == 2. 

For P = .8664, the confidence interval is 7 to 13. Thus, if^you hy¬ 
pothesize that JU = 10, then the probability that a sample X will lie 
between 7 and 13 is P = 


.8664 


42. Example: Hypothesized^ ju = 10, = 2, 

Hypothesis: A sample X lies between 7 and 13. If the probability 
that this hypothesis is correct is P = .8664, then the probability that 
it is incorrect is P = 


.1336 


43. If the probability that a hypothesis is incorrect is P = .1336, this 
probability is too large for most researchers to accept. They usually 

demand a_ (smaller/larger) probability of being 

incorrect. 


smaller 













44. Researchers generally will accept a hypothesis as being correct if 
the probability of its being incorrect is only P = .05. If you make a 
hypothesis that the X lies between one score value and another, and 
the probability of being incorrect is P = .05, the interval between the 
two score values represents the 5% confidence . _. 


interval 


46. If you determine the two score values which represent the 5% con¬ 
fidence interval, the probability that a_(symbol) lies within this 

interval is P = .95. 


X 



.0250 


46. The shaded area of this plate gives the area defining the 5% con¬ 
fidence interval, because the probability that a X lies outside this 
interval is .05. The probability that a X lies within this interval is 


.95 
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47. PLATE 65. For the 5% confidence interval, the probability that 

X lies above the ju is P =_, and the probability that X lies 

below the ix is P - _. 


.4750, .4750 


48. PLATE 65. For the 5% confidence interval, the probability that X 
lies above the ju is .4750. Entering column 2 of Table 1 for P = .4750, 
the X can lie_from the ju , 


1.96 


49. PLATE 65. For the 5% confidence interval, the probability that 
X lies below the ju is also .4750, which means X can also lie 
_s J from the /x. 


-1,96 


50. PLATE 65. Thus, for the 5% confidence interval, a X can lie from 


-1.96, 1,96 


61. PLATE 65. The 5% confidence interval is from -1,96 sj to 1.96 sj. 
This interval is usually written ±1.96, which indicates that the inter¬ 
val is from -1.96 sj to +_(symbol). 


1.96 


52. PLATE 65. The 5% confidence interval is represented by the two 
score values which lie ±_(symbol) from the ju. 


1.96 s- 
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68. Example: Hypothesized jit = 10, sj = 2. 

Determine the 5% confidence interval. The 5% confidence interval 
is represented by the score values that lie_s^ from the ju. 


±1.96 


64. Example: Hypothesized /x == 10, sy = 2. 

Because Isj =2 score points, then 1.96 sj =_score points. 


3.92 


55. Example: Hypothesized fx = 10, sj = 2. 

The 5% confidence interval is ±1.96 sj, ±1.96 sj lies 3.92 score 

points above and below the ju. These two score values are_ 

and 


13.92, 6.08 


66. Example: Hypothesized /x = 10, = 2, 

The score values 6.08 and 13.92 indicate the 5% confidence interval. 
This means that if the hypothesized mean is 10 and the standard 
error of the mean is 2, the probability that a sample mean lies 
between 6.08 and 13.92 is P = 


.9500 


67. The 5% confidence interval is represented by ±_(symbol). 


1.96 s- 
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EXERCISES 


1. Using Formula 17, calculate the standard error of the mean from 
the following sets of sample data. 

Sample (a) AT = 81 s = 4.5 

Sample (b) N = 144 s = 21 

2. In Exercise 1(a) suppose that you hypothesize that the jLx = 15. What 
is the probability that a sample X lies between 15 and 16 ? Between 
14.5 and 15,5? Between 13.7 and 15.3? Calculate the 5% confidence 
interval for this distribution. 

3. hi Exercise 1(b) suppose that yoi^hypothesize that the ju = 200. What 
is the probability that a sample X lies between 199 and 201? Above 
203? Below 196.69? Calculate the 5% confidence interval for this 
distribution. 

4. What is the generally accepted cut-off level of probability for accept¬ 
ing a hypothesis? 

5. WHiat does the symbol s- represent? 

X 
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set 12 

CONFIDENCE INTERVAL 
continued 


The 5% confidence interval is useful for stating with a specific degree 
of probability that a sample mean lies between two values. A more 
stringent confidence interval, with an accompanying higher probability, 
is the 1% confidence interval given in this set. 

The calculation of the 5% and 1% confidence limits, using Table 1, have 
been solely for use with large samples. When the number of raw scores 
in a sample is few (approximately thirty or less), the normal distribution 
does not accurately describe the variability of sample means. For each 
sample size there is a unique distribution called the t distribution. The 
nature of the t distribution is discussed in this set, as well as a method 
for determining confidence intervals for data derived from small samples. 

SPECIFIC OBJECTIVES OF SET 12 

At the conclusion of this set you will be able to: 

(1) state the two confidence intervals generally employed by 
researchers. 





(2) calculate the 1% confidence interval from the data in one sample 
which has a large N. 

(3) calculate from Table 2 the 5% confidence interval for a sample 
with a small N. 

(4) calculate from Table 2 the 1% confidence interval for a sample 
with a small N. 

(5) describe the nature of the t distribution and its relationship to 
sample size. 
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!• The 5% confidence interval is represented by ±1.96 This is 

interpreted as the interval between a value that lies_ 

(symbol) from the and a value that lies_(symbol) 

from the At. 


-1.96 , +1.96 Sy 


2. The 5% confidence interval is that Jjiterval of score values within 
which the probability that a sample X lies is P =_ __ * 


.9500 


3, If you say that the 5% confidence interval for a set of data are score 
values 5 to 15, the probability that a X lies between 5 and 15 is 
P = . 


.9500 


4. If you say that the 5% confidence intei^al for a set of data are score 
values 5 to 15, the probability that a X lies below 5 or above 15 is 
P = . 


.0500 


5. The 5% confidence interval is usually the smallest interval that 
researchers are willing to accept. They are not willing to accept 
a hypothesis that has a probability of being incorrect more than 
P = 


.0500 
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6. A more stringent confidence interval that is commonly used is the 
1% confidence interval. When you determine the 1% confidence 
interval, you know that the probability of a X lying within the inter¬ 
val is P =_. The probability that a X lies above or below 

the interval is P = 


.9900, .0100 


7. For the 1% confidence interval, the probability that a X lies within 
the interval is P = .99. Therefore, the probability that aX lies 

within the confidence interval and above the M is P =_, 

and the probability that a X lies within the confidence interval and 
below the ix is_. 


.4950, .4950 


8, For the 1% confidence interval, the probability that a X lies above 
the ju is P = .4950, and below the M is P = ,4950. 

TABLE 1. The value associated with .4950 is 


2.58 


9, TABLE 1. The value associated with a probability of .4950 is 2.58. 

Therefore, the 1% confidence interval lies ± s — . 

X 


2.58 
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10, The 1% confidence interval is represented by ±2.^8 There is 

a probability of P =_that any given X lies within the 

interval of score values designated as the 1% confidence interval. 


.9900 


Plate 66 



.0050 


11, The shaded area of this plate indicates the_ % 


1% confidence interval 


12 . For the 1% confidence interval, the probability that the X lies below 

-2,58 sy is P =_, and the probability that it lies above 

2.58 sY is P =_. 


.0050, .0050 


13, The probability that a X lies within the 5% confidence interval is 
P = ,9500. The probability that a X lies within the 1% confidence 
interval is P = ,9900. There is a greater probability of aX lying 
within the_(5%/l%) confidence interval. 


1 % 
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14 , The two confidence intervals most commonly used by researchers 
are the % and the % confidence intervals. 


5, 1 


15 . A sample that contains a large number of scores (about thirty or 
more) is called a large sample. A sample that contains a smaller 
number of scores is called a_sample. 


small 


16 , TheX’s of the samples are normally distributed only in instances 
where the samples are very large. Therefore, you can use Table 1 
for determining probabilities only where the sample has a large 
_(symbol). 


N 


17 . For samples that contain a small number of scores (small N) the 
X*s of the samples are not_distributed. 


normally 


18 , Because thus far you have used Table 1 for determining probabilities, 

the samples involved have been assumed to be_(large/ 

small). 


large 
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19, The probabilities associated with the sampling distribution of X's of 
lai*ge samples, as presented in Table 1, are derived from the 
distribution. 


normal 


20. The probabilities associated with the sampling distribution of X's of 
small samples differs from the normal distribution. Therefore, 
Table 1_(can/cannot) be used in determining probabil¬ 

ities associated with of small samples. 


cannot 


21. As you learned earlier, the sampling distribution of X^s of large 
samples is in the form of the normal distribution. The sampling 

distribution of X^s of small samples differs from the_ 

distribution, and it is called a t distribution. 


normal 


22. The sampling distribution of X^s of small samples is called a 
(symbol) distribution. 


t 


23. The shape of the t distribution varies according to the number^ of 
degrees of freedom for a sample. The df associated with the X of 
a sample is N -_. 


1 
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24. The t distribution differs for each sample size, depending upon 
the degrees of freedom associated with the sample. Thus, the t 

distribution is_(the same/different) for a sample 

with df =10 and a sample with df = 26. 


different 


Plate 67 



- df = OO (normal distribution) 

. df = 25 

- df = 9 

- df = 1 


25. This plate presents the t distribution for three differing dfs as 
compared with the normal distribution. The shape of the t distri¬ 
bution which differs the most from the normal distribution is for 

= _ . 


one 


26. PLATE 67. The t distribution becomes more like the normal dis¬ 
tribution as the df of the sample_(increases/ 

decreases). 


increases 


27. PLATE 67. In this plate, the t distribution which most nearly ap¬ 
proximates the normal distribution is the distribution for df = _. 


25 
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28, PLATE 67. As the c?/becomes very large, the t distribution be¬ 
comes similar to the_distribution, 

normal 

29, TABLE 2. This table presents the t values associated with each df 
for P = .10, P = .05, P == .02, and P = .01. From this table, the t 
value, when df is infinitely large (oo), is 1,960 for P = .05 and 2.576 

for P = *01, which, you will recall, is the same for the_ 

distribution. 


normal 

30, TABLE 2. For P = .05, the t value associated with df - 19 is 2.093, 
whereas the t value associated with ^/ = 5 is 



2.571 


31, TABLE 2. As the cff decreases, the t values associated with the 
probability levels_{increase/decrease). 


increase 

32. TABLE 2. For a sample with JV - 18, df ^ 

17 


33, TABLE 2. For a sample with N = 1^, df ~ 17, the t value associated 

with P = .05 is_. The t value associated with P = .01 

is 


2.110, 2.898 
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34, TABLE 2. To determine the 5% confidence interval for samples 

in which N = 18, you use the t values associated with df =_, 

which is 


17, 2.110 


35, Sample: N = 18, df = 11. 

TABLE 2. For P = .05, t = 2.110. Whereas, for large samples,you 
used 1.96 in determining the 5% confidence interval, for small samples 

with N = 18, you use_to determine the 5% confidence 

interval. 


2.110 


36, Sample: N = 18, s^=2. ForP = .05, 2.110. 

Because this sample is small, the 5% confidence interval is not 
±1.96 s—, but is ± s—. 

A A 


2.110 


37, Sample: N = 18, 5^ = 2. 

Suppose you hypothesize a population ju = 10. The 5% confidence 
interval is designated by ±2.110 Therefore, at -2.11 s^is score 
value_, and at 2.11 sy is score value_. 


5.78, 14.22 


38, Sample: N = 18, =2. Hypothesize ju = 10. 

At -2,11 sj'is score value 5.78. At 2.11 score value 14.22. 

Therefore, the 5% confidence interval for these data is from_ 

to 



5.78, 14.22 
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39. Sample; N = 18, sy= 2. Hypothesized (x = 10. 

The 5% confidence interval is from 5.78 to 14.22. This means that 
you could make a hypothesis that a X is between 5.78 and 14.22 
with the probability of being correct of P =_. 


.9500 


40, The 1% confidence interval for small samples may be determined in 
exactly the same manner as the 5% confidence interval. Use Table 2 

to determine the t value in the column headed P =_opposite 

the df of the sample. 


.01 


41, Because the size of t increases as the df of the sample decreases, 

it follows that the confidence interval is___(larger/ 

smaller) for small samples than for large samples. 


larger 
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EXERCISES 


1. What are the two confidence intervals generally employed by re¬ 
searchers? Which is the more stringent? 

2. Determine the 1% confidence interval for the following samples: 

(a) N = 121, s = 7.4. Hypothesized ju = 100. 

(b) N = 64, s =5.2. Hypothesized M = 75. 

3. What table should be used for estimating sy sample statistics 
when the sample has a small N? A large N? 

4. Determine the 5% and 1% confidence intervals for the following 
samples: 

(a) N = 20, 5 — = 3.5. Hypothesized ju = 50. 

X 

(b) N = 12, s—= 4.11. Hypothesized u - 32,6. 

X 

5. What is the t distribution? How is it related to sample size? 
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NULL HYPOTHESIS 
t BISTRIHUTION 


When a researcher begins a study, he generally sets out to test the 
validity of a hypothesis. He may wish to discover whether or not a cer¬ 
tain treatment has a significant effect upon the behavior of people; or 
whether, under specified conditions, certain changes take place. In any 
case, he makes a prediction, or research hypothesis, which he then 
attempts to prove or disprove. The hypothesis that is used when making 
statistical analyses of data is somewhat different from the research 
hypothesis. This set will present the definition of a null hypothesis and 
explain how to determine whether to accept or reject it. It will also 
discuss sampling error in connection with the means of two samples. 
A new term, the standard error of the difference between means, will 
be defined and a method will be given for determining whether to accept 
or reject the null hypothesis on the basis of data from two samples. 

SPECIFIC OBJECTIVES OF SET 13 


At the conclusion of this set you will be able to: 



(1) state a research hypothesis as a null hypothesis. 

(2) state what is meant by the standard error of the difference 
between means. 

(3) make a determination about whether to accept or reject the null 
hypothesis, based upon the probability that the difference be¬ 
tween two sample means is due to sampling error. 

(4) identify the symbol cr^^y^ 
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1 • Research is generally concerned with determining probability levels 
associated with hypotheses. The statement "The sample X lies 
between score value 30 and score value 50" could be taken as a 


hypothesis 


2. If you have data for one sample drawn from a population, you can 
make probability statements regarding the acceptability of a 
__about the population. 


hypothesis 


8, You decide whether or not to accept a hypothesis by determining 
its_ of being true. 


probability 


4. Thus far you have been concerned with hypotheses regarding only 
one population. When you are concerned with whether or not two 

populations differ, you must make a_regarding 

the difference between the two populations. 


hypothesis 


6. Statistical hypotheses are generally stated in null form. The word 
null means no. Thus, a null hypothesis states there is ;_differ¬ 

ence between two populations. 


no 
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6, Here is a null hypothesis: ’'There is no difference between the 
heights of tenth-grade boys and tenth-grade girls.” This is the 
same as saying "The difference between the heights of tenth-grade 
boys and girls is equal to_ 


zero 


7. A hypothesis that states there is no difference between two popula¬ 
tions is called a_hypothesis. 


null 


8, If there is no difference between the heights of tenth-grade boys 
and girls, you may say that, in regard to height, they come from 
the same 


population 


9, Because it is not practical to measure the height of all tenth-grade 

boys and girls, you would probably measure a_of each 

group. 


sample 


10. If there is no difference between the X's of the two samples, you can 

conclude that there is_difference between the heights of the 

populations of tenth-grade boys and tenth-grade girls. 


no 


11. You recall, however, that the X’s of two samples will seldom be 
exactly the same, due to_error. 


sampling 
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12 , TheX's of two samples may differ somewhat, due to sampling error, 
even when the samples are selected from the same_^. 


population 


13, TABLE 2. In addition to the t values for P = .0500, the t values 

for P = _are also presented in the fourth column of this 

table, 


.0100 


14, TABLE 2. The second and fourth column of this table present the 

t values for P =_and P =_for the differing because 

these two probability levels are those generally used in reporting 
research findings. 


.0500, .0100 


15. When you test the null hypothesis, you are asking this question: 
^’Is the difference between the X's of the two samples larger than 
can be expected due to sampling_?" 


error 


16, If the difference between the X's of two samples is larger than 
can be expected as a result of sampling error, then you reject the 
null hypothesis and conclude that the two samples were selected 
from two different 


populations 
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17. If you selected a pair of sample^ from a population, it is likely 
that, due to sampling error, the X^s of the pair of samples would 


differ 


18, If ^u selected a large number of pairs of samples, you could obtain 
a X difference score for each_of samples. 


pair 


19, The X difference score for two samples would be obtained by sub¬ 
tracting the X of the first sample from the _ (symbol) of the 

_sample for each pair. 


X, second 


20, Thus, if you subtracted the X of the second sample from the X of the 

first sample, the X difference score would be positive if the_ 

(first/second) sample X was larger than the_(first/ 

second) sample X. 


first, second 


21, Therefore, in many pairs of samples, if you had randomly paired 
your samples, you should have just as many positive X difference 
scores as you would have_X difference scores. 


negative 


22, The standard deviation of sample means, you recall, is called the 
standard error of the mean. Therefore, the standard deviation of 
the difference between sample X’s is called the standard error 
of the _. 


difference 
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23. The standard error of the difference is the standard deviation of a 
distribution of differences between sample_’s (symbol). 


X 


Plate 68 

Sampling Distribution of Differences 



24. As indicated in this plate, the symbol for the standard error of the 
difference is , which represents a standard deviation of X 

scores. 


difference 


26. PLA_TE 68. The X of this distribution is zero. This indicates that 
the X of all the difference scores derived from pairs of samples 
selected from one population is 



zero 


26. PLATE 68. The symbol indicates the standard error of the 

difference, which is the_of 

difference scores for all possible samples of a specific size. 


standard deviation 
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27. PLATE 68. The symbol for the standard error of the difference 
is 


^diff 


28. The distribution of X differences follows the normal curve only 
when the samples are_(large/small). 

large 


28. Where the samples are small, the distribution of the differences 
follow the_(symbol) distribution. 


t 


30, PLATE 68. The curve in this plate is indicated by a dotted line 
because the actual shape of the t distribution depends upon the de¬ 
grees of_available. 


freedom 


31, PLATE 68. The shape of the t distribution curve varies with the 
number of_ of available. 


degrees, freedom 


32, PLATE 68, If the shape of the curve of the t distribution varies 
with the number of df available, then the proportions of the area 
under the curve also_with the number of (i/available. 

vary 
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33, If you reject the null hypothesis, you conclude that the two samples 

were selected from_ (the same/different) pop- 

ulation(s). 


different 


34. If you accept the null hypothesis, you conclude that the two samples 

were selected from_(the same/different) pop- 

ulation(s). 


the same 


36. In making a decision whether to accept or reject the null hypothesis, 
you must first assume that the two sample X*s come from the same 
population. When you ma^ this assumption you are saying that 

the difference between the X's is only due to_ 

error. 


sampling 


36, Statisticians generally agree to use P = .0500 as the "cut-off” point 
in accepting or rejecting the_hypothesis. 


null 


37. Assume that you have X's of two samples. If the probability of 
obtaining a difference as large as you obtain between your samples 
is P = .0500 or less, there is a very small probability that the two 
samples are from the same population. Therefore, you should 
- _(accept/reject) the null hypothesis. 


reject 











38. If the probability of obtaining a difference as large as you obtain 
between your samples is greater than P = .0500, there is a large 
probability that the two samples are from the same population and 
that the difference is due only to sampling error. Therefore, you 
should_(accept/reject) the null hypothesis. 


accept 


Plate 69 


t Distribution of X Difference Scores j 

/ P = .0950 \ 


\ 


/ 


P = .0250 












P = .0250 




A 


B 


[-^Probability of obtaining 2X^s 
whose difference lies within 
I this area is P = .9500 


Probability of obtaining 2X^s 
Area of" whose difference lies outside 

significant I this area is P = ,0500 

difference | 

——between Area of non-significant difference 

, between X^s 


Area of 
significant 
difference 
between X^s —► 


— Area of Area of acceptance of the null 

rejection of ■ hypothesis 
the null 
hypothesis * 


I- Area of 
rejection of 
the null 
hypothesis 


39. This plate presents a diagram depicting the t distribution. The 
curve of the distribution is indicated by a dotted line because the 
actual shape of the t distribution depends upon the_(symbol). 


df 
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40. PLATE 69,^Note points A and B in this plate. The probability of 
obtaining a X difference score that lies within the area designated 
by these two points is P =_. 


.9500 


41. PLATE 69, The probability of obtaining a X difference score that 
lies below the area designated by A is P =_. 


.0250 


42. PLATE 69. The probability of obtaining a X difference score that 
lies above the area designated by B is P =_, 


.0250 


48. PLATE 69. The probability of obtaining a X difference score that 
lies either above or below the area designated by A and B is 
P = 


.0500 (.0250 + .0250) 


44. PLATE 69.^The probability is so very small (P = ,0500) of your 
obtaining a X difference score that lies outside the area designated 

by A and B that researchers are willing to_(accept/ 

reject) the null hypothesis if they obtain a difference that lies out¬ 
side this area. 


reject 


46. accept the null hypothesis, you are in effect saying that the 

X difference you obtained is due to sampling error and the difference 

is therefore_(significant/non- 

significant). 


non-significant 
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46, If you reject the null hypothesis, you are in effect saying that the 
X difference score that you obtained is so large that it is not 
due only to sampling error, and therefore the difference is 
_(significant/non-significant). 


significant 


47. PLATE 69, If you reject the null hypothesis, you are saying that 
the difference between your sample X’s is significant. This is the 
same as concluding that the two samples come from different 


populations 


48, PLATE 69. The lines drawn at points A and B are dotted, which 
indicates that their placement along the base line varies with the 
shape of the distribution curve, depending upon the_(symbol). 
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EXERCISES 


1. A large number of college freshmen are randomly assigned to two 
groups which are to be taught elementary statistics by different 
methods. A final examination will be given. State the null hypothesis 
to be tested. 

2. A researcher makes the hypothesis that children who study in the 
morning should receive higher algebra test scores than children 
who study in the evening. A final examination is given at the end 
of the semester. State the null hypothesis to be tested, 

3. What is meant by the term standard error of the difference? 

4. If you designate P = .0500 as your acceptable level for significance, 
and your research study yields a significance level of P = .1300, 
would you accept or reject the null hypothesis? 

5. When you accept the null hypothesis, what statement can you make 
about the difference between your two sample means ? 

6. What does the symbol o-^^yy represent? 



set 14 

RATIO 


In the previous set a method was presented for accepting or rejecting 
the null hypothesis on the basis of the standard error of the difference 
between pairs of sample means. However, it is rarely possible to 
select a large number of samples, determine the difference between 
their means, and distribute these differences. A method has therefore 
been devised for estimating the standard error of the difference between 
means from the data in just two samples. 

This set will present a method for "pooling" sums of squares in order 
to estimate the standard error of the difference between means, and also 
for determining whether or not two samples differ significantly by em¬ 
ploying the ^-test of the significance between sample means. 

The statistical concept of "significant difference" will also be defined. 

SPECIFIC OBJECTIVES OF SET 14 

At the conclusion of this set you will be able to: 

(1) using Formula 18, estimate the standard error of the difference 
from data contained in two samples. 
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(2) conduct a ^-test for the significance of the difference between 
two sample means, using Formula 19. 

(3) using Table 2, evaluate the level of significance of a ^-ratio. 

(4) determine whether to accept or reject the null hypothesis on 
the basis of the ^-test, 

(5) identify the symbol . 
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1. If you had many samples selected J[rom a population, you could 
compute the difference between the X's for each pair of samples, 
and then calculate the standard deviation of the X difference scores. 

This would be called the standard error of the,_, 

and its symbol would be_. 


difference, 


2. When you have only two samples, you may use the data contained 
in them to estimate the standard error of the difference. Formula 
18 gives the formula for estimating the standard error of the differ¬ 
ence when you have only two samples. The symbol for this estimate 
is 


^diff 


3. FORMULA 18. In this formula, indicates the sum of squares 
for the first sample, and indicates the sum of squares for the 
_sample. 


second 


4. FORMULA 18. The symbol iVi indicates the number of scores in 

the_(first/second) sample. The symbol indicates 

the number of scores in the_(first/second) sample. 


first, second 


5. FORMULA 18. The part of the formula that states Hx^ + Hx^ 
indicates that you *^pool” the sum of squares of the two_. 


samples 
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6. F ORMULA 18. The part of the formula that states + N 2 - 2 

indicates that you ”pooF^ the degrees of_of the 

two samples. 

Hint: (ATj - 1) + {N 2 - 1) ^ N 2 - 2 


freedom 


7. Recall from Formula 13 that when you divide the sum of squares by 
df, you obtain the_. 


variance 


8 . When you divide by df, you obtain the variance. Thus, the part 
of Formula 18 that states 

+ '^X2 

ATi + iV2 - 2 

indicates that you are "pooling" the__ of the two 

samples. 


variances 


9. FORMULA 18. The entire formula indicates that you are estimating 
sdiff by combining the Sx of the two samples using the pooled 


variances 
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Plate 70 

Height of Tenth-^Grade Children 


Isf Sample 

Boys 

2nd Sample 

Girls 

Ni = 26 

N2 = 37 

Xi = 66 inches 

X 2 =67 inches 

= 1716 

XX2 = 2479 

= 113,338 

XXz^ = 166,205 

This plate gives the data for two samples: the heights of tenth-grade 
boys and the heights of tenth-grade girls. The null h 5 rpothesis that 
is to be tested is: There is difference between the heights of 

tenth-grade boys and tenth-grade girls. 


no 


11. PLATE 70. The null hypothesis to be tested is: There is no differ¬ 
ence between the heights of tenth-grade boys and tenth-grade girls. 
This is the same as saying that the true difference between these 
two groups is _. 


zero 


12. PLATE 70. The data indicate that the difference between the X of 
these two samples is_inch(es). 


1 
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13 . PLATE 70. The question to be answered is: How probable is it 
that the difference of 1 inch between the sanaple X*s is due to sam¬ 
pling _? 


error 


14 . When you determine the probability that your sample X difference is 
due to sampling error, you are testing the_hypothesis. 


null 


IB. 


PLATE 70. To test the null hypothesis, you must determine the . 

To do this, you must first compute the_(symbol) for each 

of the samples. (See Formula 18.) 




16 . PLATE 70. Using Formula 12 for the computation of by the 
raw score method, for each sample substitute the appropriate values 
for the symbols and do the necessary arithmetic: 

Boys =_ 

Girls _ 


Boys hx-^ = 113338 
Girls 11X2 = 166205 


(1716)^ 

26 

(2479)^ 

37 


82 

112 
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17. PLATE 70, Boys = 82, Girls = 112. 

To determine the s^iff , substitute the appropriate values for the 
symbols in Formula 18, 



11 \ 

(— + — 

Sdiff 

Y 1 + -2} 

' <^ 1 ^^ 

1 ) 


j 1 82 + 112 \ 


^diff = 

Y \26 + 37 - 2 / 

\26 37 / 


18. PLATE 70. 3 H-jj-) 

Do the necessary arithmetic to determine the value of , 


^diff - 



19. PLATE 70. = .4546. 

This indicates that the standard deviation of a large number of 
differences between pairs of sample is_. 


.4546 


20. PLATE 70. The difference between the X heights of boys and girls 
is 


1 


21. To determine the probability of obtaining a difference score of 1, 
you must compute a f-ratio. Formula 19 gives the formula for de¬ 
termining the f-ratio. The symbol t is used in this formula because 
the two samples have_(small/large) JV’s. 


small 


194 










22. FORMULA 19. The title of this formula indicates that it is for 

_samples. The meaning of this term will 

be fully discussed later. 


independent 


23. FORMULA 19. This is called a ;f-ratio because it is a ratio between 

the difference of the two sample X's and the_ 

of the 


standard error, difference 


24. PLATE 70. You determined that = .4546, Using Formula 19, 
calculate the ^-ratio by substituting the numerical values for the 
symbols and doing the necessary arithmetic. 


t 


67 - 66 
.4546 


1 

.4546 


2,1997 


26. PLATE 70. 2.1997. 

To interpret the meaning of this f-ratio, you must first determine 
the t?/associated with it. The df for the boy sample is - 1, or 
_. The df for the girl sample is JV 2 - 1, or_. 


25, 36 


26. As shown below Formula 19, the degrees of freedom associated with 
this ^-ratio is -h N 2 - 2. This is algebraically the same as 

{Ni-1) + (iV' 2 -l) which is simply the sum of the_^s (symbol) 

associated with the two samples. 


df 


27. PLATE 70. 2.1997, Ni=26, JV 2 - 37 . 

df - Ni + N 2 - 2. For these data, df - _ 


61 (26 + 37 - 2) 
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28. PLATE 70. ^ =2.1997, fi?/=61. 

From Table 2, the value of t required for P = .05 with df - 61 is 
_. (Use the nearest listed df in the table.) 


2.000 

(the nearest listed df is for df = 60) 


29, The X's of two samples are said to be significantly different (and 
hence, come from different populations) if the probability of obtaining 
the difference that you did is equal to or less than P = .0500. Table 2 

presents the f-ratios corresponding to P =_,_, 

_, and_, for differing dfs. 


.1000, .0500, .0200, .0100 


30. 


TABLE 2. The values of t required to reach the probability level of 
P = .0500 are presented because P = .0500 is the largest probability 
that most statisticians will accept as indicating a significant dif¬ 
ference between sample_’s (symbol). 


X 


31. TABLE 2, If an obtained ^-ratio is larger than the tabled value of 
t for P = ,0500, you conclude that the two samples have been selected 
from different 


populations 


32, In testing the null hypothesis, if the ^-ratio is larger than the tabled 
value of ^ at P = .0500, the difference between the two samples is 
said to be_(significant/non-significant). 


significant 
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33. If your ^-ratio indicates that the X's of the samples are significantly 
different, you reject the null hypothesis, because the null hypothesis 
states that the two samples come from the same_. 


population 


34. PLATE 70. 2.1997, 

From Table 2, the ^-ratio required for P = .0500 is 


2.000 


Plate 71 



If-ratio t = -2.000 0 if = 2.000 


35. This plate presents the t distribution for df = _, which is the 

distribution appropriate for Plate 70. 


61 


36. PLATE 71. For df = 61, the probability of obtaining a ^-ratio above 
2.000 is P = 


.0250 
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37, PLATE 71. For df = 61, the probability of obtaining a ^-ratio below 
^ = -2.000 is P = 


.0250 


38, PLATE 71. For df - Ql, the probability of obtaining a f-ratio either 
above t = 2.000 or below t = -2.000 is P = 


.0500 


39, PLATE 71. For df - 61, the probability of obtaining a f-ratio below 
t = -2.000 or above t = 2.000 is P = .0500. Therefore, for this df, the 
probability of a X difference score yielding a f-ratio as large as 
either plus or minus 2.000 is_. 


.0500 


40. PLATE 70. f= 2.1997, df=61. 

From Table 2, the f-ratio required for P = .0500 is 2.000. The 

f-ratio obtained from the data in Plate 70_(is/is not) as large 

as or larger than is required for P = .0500. 


is 


41, PLATE 70. f =2.1997, ^f/=61. 

This f-ratio is larger than t - 2.000, which is required for signifi¬ 
cance at P = .0500. Therefore, you_ (accept/reject) the 

null hypothesis that there is no difference between the populations 
from which these two samples were selected. 


reject 









EXERCISES 


1. Calculate the standard error of the difference, using Formula 18, and 
the ^-ratio, using Formula 19, for the means of the following two 
groups. 


Group A 

Group B 

N = 13 

i\r = 19 

X = 7 

X = 9.4 

= 96 

= 112 

Using Table 2, evaluate the significance of the f-ratio obtained in 
Exercise 1. Would you reject the null hypothesis at P = .0500 ? 
At P = .0100? 

For the following two groups, 
difference,, using Formula 18 
^-ratio. 

calculate the standard error of the 
. Using Formula 19, calculate the 

Group X 

Group Y 

iV = ll 

II 

X = 7.6 

X = 10.9 

M 

II 

=98 


4. Using Table 2, evaluate the significance of the f-ratio obtained in 
Exercise 3. Would you reject the null hypothesis at P = ,0500 ? 
AtP==.0100? 

5. Identify the symbol . 
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set 15 

ONE- AND TWO-TAILED TESTS 
TYPE I AND TYPE li ERRORS 


Sometimes a researcher makes a hypothesis which states that one 
of his experimental groups will show the greatest increase. At other 
times he is not willing to hypothesize which group will change the most. 
The type of research hypothesis he makes determines how he can evaluate 
the significance of the difference he obtains. In this set the difference 
between a one- and a two-tailed test of significance will be explained, as 
well as how they are conducted. 

Because of the sampling error involved in using sample data, a 
researcher can never be absolutely certain that he has made the right 
choice in accepting or rejecting the null hypothesis. If he accepts the 
null hypothesis, there is always some measure of probability that he has 
been wrong in his decision. If he rejects it, there is also some probabil¬ 
ity that this decision is in error. These two types of possible errors 
and their probability of occurrence are discussed in this set. 

SPECIFIC OBJECTIVES OF SET 15 

At the conclusion of this set you will be able to: 


^00 



(1) determine which research hypotheses require one-tailed and 
which require two-tailed tests of significance. 

(2) use Table 2 to determine probability levels for one- and two- 
tailed tests of significance. 

(3) state what is meant by a Type I error. 

(4) state what is meant by a Type II error, 

(5) state how the significance level is related to Type I and Type II 
errors.. 



1, PLATE 71, page 197. Notice that the probability of P = .0500 is 

divided, with the left ^’tail" of the distribution containing P =_ 

and the right ’’tail" of the distribution containing P =_. 


.0250, .0250 


2. PLATE 71, page 197. Because you are concerned with the probabil¬ 
ities in the two "tails" of the distribution, this is called a_ 

tailed test of significance. 


two 


3, A two-tailed test of significance is concerned with the probability 

that the obtained X difference score lies in_(one/ 

either) tail of the distribution. 


either 


4, If a "two-tailed" test of significance is concerned with the probability 
that an obtained X difference score lies in either tail of the distribu¬ 
tion, it follows that a "one-tailed" test o^significance is concerned 
with the probability that an obtained X difference score lies in 
tail of the distribution. 


one 


5, A discussion of the types of hypotheses is needed. Recall that the 

null hypothesis states that there is_difference between two 

groups. 


no 


6 . A research hypothesis may state that there is a group difference. 
Thus, a hypothesis that states "Tenth-grade boys and tenth-grade 
girls differ in their heights" is a_hypothesis. 


research 
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7. The research hypothesis "Tenth-grade boys and tenth-grade girls 
differ in their heights" does not imply the direction of the difference; 
that is, whether boys are taller than girls or vice versa. It is only 
concerned with whether there is a difference in either direction. 

To "test" this hypothesis, you would use a_ (one/two) tailed 

test. 


two 


8 , If a research hypothesis does not hypothesize the direction of the 
difference (that is, whether one specific group is higher than another), 
the appropriate test of significance is the two-tailed test. If it does 
hypothesize the direction of the difference, the appropriate test of 
significance is the -tailed test. 



one 


9 . 


The research hypothesis "Intelligence test scores of urban children 
differ from intelligence test scores of rural children^’ requires a 
_-tailed test of significance. 


two 


10 . 


The research hypothesis "Intelligence test scores of urban children 
differ from intelligence test scores of rural children" requires a 
two-tailed test of significance because it does not hypothesize the 
_of the difference. 


direction 


11 . 


The research hypothesis "Intelligence test scores of urban children 
are higher than the intelligence test scores of rural children" re¬ 
quires a_-tailed test of significance. 


one 


12. The research hypothesis "Intelligence test scores of urban children 
are higher than the intelligence test scores of rural children" re¬ 
quires a one-tailed test of significance because the_ 

of the difference is hypothesized. 


direction 
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13, PLATE 70, page 192. The test of significance that was made for 
these data was a_-tailed test because the direction of the dif¬ 
ference _(was/was not) hypothesized. 


two, was not 


14, PLATE 70, page 192, If, before the collection of the data, the re¬ 
searcher had made the research hypothesis "Tenth-grade boys are 

taller than tenth-grade girls," he could have made a_-tailed 

test of significance. 


one 


16, In making a one-tailed test, you perform the t test in exactly the 
same manner as with the two-tailed test. However, in evaluating 
the significance of the ^-ratio for a one-tailed test, you use only 
_tail of the t distribution in determining the probability. 


one 


Plate 72 



18, This plate presents the t distribution for (f/= 61. For the one-tailed 
test, notice that the probability level (P = .0500) is indicated on 
_(one/two) tail(s) of the distribution. 


one 
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17, PLATE 72. The area of the curve that designates the .05 level of 
significance in a one-tailed test is designated at the right tail of 
the curve. Recall, from Plate 71, that for a two-tailed test, the .05 
level of significance is divided between the_tails of the dis¬ 

tribution. 


two 


18, TABLE 2. This table presents t values that designate probability 
levels for both tails of the distribution. Therefore, these probability 
levels, as listed at the top of the table (P = .1000, P = ,0500,P = .0200, 
P = ,0100), are for_^-tailed tests. 


two 


19, TABLE 2. The probability levels listed are for two-tailed tests. 
To determine the probability levels appropriate for one-tailed tests, 
you must divide the listed P values by 2. Thus, the t values for 
P = .05 for a one-tailed test appear in the column headed P =_. 


.1000 


20. TABLE 2. For'a one-tailed test, the P values of this table must be 
divided by 2, Therefore, for a one-tailed test, the heading of the 

columns should be: P = .0500, P = .0250, P =_, and 

P = 


.0100, .0050 


21, To determine the ^-ratio required at the .0500 level of significance 
for a two-tailed test, use the column headed P = .0500. To deter¬ 
mine the if-ratio required at the .0500 level of significance for a one- 
tailed test, use the column headed P =_. 


.1000 
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22. PLATE 70, page 192. TABLE 2. df=^l. 

The f-ratio required for significance at the ,0500 level, for a two- 
tailed test, is 2.000 (in column P = .0500). The ^-ratio required 

for significance at the .0500 level for a one-tailed test is_ 

(in column P =_). 


1.67, .1000 


23. PLATE 70. df 61. 

For a two-tailed test, the t value required for P = .0500 is 2.000. 
For a one-tailed test, the t value required for P = .0500 is 1.670. 

Thus, the required value of t for P = ,0500 is_ 

(smaller/larger) for a one-tailed test than for a two-tailed test. 


smaller 


24. Recall that you can only make a one-tailed test of significance when 
your research hypothesis states the direction of the difference. When 

there is no direction hypothesized, you must use a_-tailed 

test. 


two 


25. TABLE 2. For df = 21, the f-ratio required for significance at 

P = .0500 for a two-tailed test is_. The ^-ratio required 

for a one-tailed test is 


2.080, 1.721 


26. TABLE 2. For df = 30, the ^-ratio required for significance at 

P = .0100 for a two-tailed test is_. The ^-ratio required 

for a one-tailed test is 


2.750, 2.457 (from column P = .02) 
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27. Recall that the f-ratio is obtained in exactly the same manner 
(Formula 19) for a one-tailed test or a two-tailed test. The differ¬ 
ence between the two types of tests is in the required value of_ 

(symbol) for significance at the different probability levels. 


t 


28. The research hypothesis "Rats that are fed a high protein diet will 
grow faster than rats that are not fed a high protein diet" permits 
a_-tailed test of significance. 


one 


29. The research hypothesis "There will be a difference in the arithmetic 
scores of fifth-grade children who are taught by method A and those 

who are taught by method B" permits a _-tailed test of 

significance. 


two 


30. On the basis of the statistical test, you either accept or 
the null hypothesis. 


reject 


31. Recall that there are two types of hypotheses: the research hypoth¬ 
esis, and the statistical hypothesis, which is called the_ 

hypothesis. 


null 


32. The hypothesis that may state there is a difference between two 
groups is called a_hypothesis. 


research 
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33. In addition to hypothesizing a difference between two groups, the 

research hypothesis may also state the __of the 

difference. 


direction 


34. The statistical hypothesis "tested" by statistical methods is the null 
l^pothesis, which states there is_difference between the groups. 

no 


35. When the research hypothesis does not state the direction of the 
difference, you use a_(one/two) tailed test. 

two 


38. When the research hypothesis does state the direction of the differ¬ 
ence, you use a_(one/two) tailed test. 


one 

37, For either type of research hypothesis, the statistical hypothesis 
to be "tested" is the_hypothesis. 


null 


38. When using sample data, you never are able to accept or reject the 
null hypothesis with absolute certainty, because there is always 
the possibility that you are wrong, due to_error. 


sampling 
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30, However, on the basis of a ^ test using sample data, you make the 
decision either to accept or reject the null hypothesis. Researchers 
generally make the decision to reject the null hypothesis if the 
probability that their decision is incorrect is as small or smaller 
than P = . 


.0500 


40. The use of the probability of P = .0500 in making the decision to 

accept or_the null hypothesis is purely an arbitrary 

one, but it is the level of significance usually accepted by most 
researchers. 


reject 


41. You can make two types of error when you decide either to accept 
or to reject the null hypothesis. Read the statements in Plate 73. 

The two types of error are called_error and 

error. 


Plate 73 

Type I Error: Rejecting the null hypothesis on the basis of sample data, 
when, in fact, no difference exists. 

Type II Error: Accepting the null hypothesis on the basis of sample data, 
when, in fact, a true difference exists. 


Type I, Type II 


42, PLATE 73. If, on the basis of your statistical test, you have accepted 
the null hypothesis when, in fact, the two samples were selected from 

different populations of scores, you have made a Type_(l/H) 

error. 


II 
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43. PLATE 73. If, on the basis.of your statistical test, you have re¬ 
jected the null hypothesis when, in fact, the two samples were 
selected from the same population of scores, you have made a 
Type_(l/ll) error. 


I 


44. In rejecting a null hypothesis on the basis of sample data, there is 
always some probability that you have made a Type I error, be¬ 
cause the difference that you find between samples may be due to 
error. 


sampling 


46, If you reject the null hypothesis at the ,0500 level of significance, the 
probability that you have made a Type I error is only P =_. 


,0500 


46, If you reject the null hypothesis at the .0100 level of significance, the 
probability that you have made a Type I error is only P =_. 


.0100 


47, The smaller the level of significance, the_(smaller/ 

greater) the probability that you have made a Type I error. 


smaller 
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EXERCISES 


1. What is the difference between a one-tailed and a two-tailed test of 
significance? 

2. Should a one-tailed or a two-tailed test of significance be employed 
when testing the following research hypotheses? 

(a) Students exposed to kinesthetic reading methods will show 
greater gains on the California Primary Reading Achievement 
Test than will a comparable group of students exposed to phonic 
reading methods. 

(b) Students involved in intensive counseling will show greater 
g^ins on the Pilgrim Adjustment Inventory than will comparable 
students not receiving such counseling, 

(c) There will be a difference in the school drop-out rate for stu¬ 
dents of differing social classes. 

(d) Students who take a traditional social studies course will show a 
different rate of progress in cognitive development than com¬ 
parable students taking the Smith Social Studies Curriculum. 

3. Using Table 2, determine the ^-ratio required for a two-tailed test 
of significance for P = .0500 when df - 12, Determine the f-ratio 
required for a one-tailed test. 

4. Using Table 2, determine the f-ratio required for a two-tailed test 
of significance for P = .0100 when df =23. Determine the ^-ratio 
required for a one-tailed test. 

5. For the same df, does a one-tailed test or a two-tailed test require 
a larger ^-ratio? 

6 . What is meant by a Type I error? 

7. What is meant by a Type II error ? 

8 . How is the size of the level of significance related to the probability 
that, in rejecting the null hypothesis, you have made a Type II error? 
A Type I error? 
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set 16 

SIMPLE ANALYSIS OF VARIANCE 


The t test has provided a method for analyzing the difference between 
two sample means. Occasionally a research design requires that statis¬ 
tical techniques be applied to the data contained in three or more samples. 

A common statistical technique which permits an analysis of the data 
in more than two samples at a time is called the analysis-of-variance 
technique. As implied by the name of this technique, it is concerned 
with analyzing the variance of the raw scores in the various samples. 
This set will indicate how the total variance contained in a number of 
samples can be divided into component parts and analyzed in order to 
determine the significance of the difference between the samples. The 
form of analysis of variance presented in this set and in set 17 is the 
simplest form involving three or more samples. 

SPECIFIC OBJECTIVES OF SET 16 

At the conclusion of this set you will be able to: 


919 , 



(1) state the tvvo components into which the total sum of squares are 
divided in the analysis-of-variance technique. 

(2) use formulas 22 through 24 to compute the sums of squares for 
the components in the analysis-of-variance test for the signifi¬ 
cance of differences between three or more samples. 

(3) use formulas 25 and 26 to compute the degrees of freedom 
associated with the components in the analysis-of-variance 
technique.' 

(4) identify the symbols 
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1, The t test, you will recall, is used to test the null hypothesis that 
there is no difference between the mean scores of_popula¬ 

tions . 


two 


2» In research designs involving more than two samples you may also 
wish to test the null hypothesis. If you have three samples, for 
instance, the null hypothesis would state that there was no dif¬ 
ference between the mean scores of_populations. 


three 


3. Since the t test is useful only for testing the difference between 
two means, you must apply another statistical test, called analysis 
of variance s in cases in which there are_or more samples. 


three 








Plate 74 

An Experiment in Teaching Arithmetic 


The Centervil le school wished to determine if certain methods of teach¬ 
ing arithmetic were better than others. It devised three different methods 
of teaching arithmetic and divided twenty-one children into three groups, 
giving each group a different method. The three groups were designated 
Group A, Group B, and Group C. These three groups of children did not 
differ initially on arithmetic achievement, At the end of the school year 
an arithmetic achievement test was given to each of the children in the 
three groups, and the following scores were obtained: 


Group A 

Group B 

Group C 

94 

90 

86 

92 

86 

84 

90 

84 

80 

90 

82 

78 

86 

82 

■ 78 

86 

80 

72 

84 

78 


82 



00 

li 

11 

li 

li 

00 

00 

= 83.143 

Xc = 79.667 


4, The experiment presented in Plate 74 is concerned with examining 
the difference between_methods of teaching arithmetic. 


three 
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5. 


Hie null hypothesis being tested in this experiment is that there is no 
_ between the three methods of teaching 


arithmetic. 


difference 


6 , Notice that in Plate 74 there is a mean score reported for each 

group. Observe that the X*s of the three groups_(do/ 

do not ) differ. 


do 


7, Since the means of the three samples differ, your concern here 
is determining whether these differences are large enough to rep¬ 
resent real differences or whether they are due to_ 

error. 


sampling 


8, When more than two samples are involved you can decide whether 
or not to accept the null hypothesis by using the analysis of 
_technique. 


variance 


9, Hate 74 indicates that the raw scores within each of the three groups 
_(do/do not) differ. 


do 


10, Plate 74 also indicates that the means of the three groups 
(do/do not) differ. 


do 
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11 , Comparing the variance among the means of the several groups 
with the variance among the scores within the groups is the major 
focus of the analysis-of-_technique. 


variance 


Plate 75 

Analysis of Variance for Experiment Presented in Plate 74 

F 
6.54 

Total SSf. - 584 dft =20 


Degrees of 

Sum of Squares Freedom 


Between groups 
Within groups 


SS^ = 245.8 


SS^= 338.2 


dfb = 2 


Mean Square 
(Variance) 

MSi^ = 122.9 


MS^= 18. 


12, Plate 75 presents the usual method of reporting analysis-of-variance 
tests. We shall examine this plate, which presents the statistical 
test of the data in Plate 74, and then we shall learn how it was 
computed. 


go on to next frame 


13 , PLATE 75. The new symbols introduced in this plate can be readily 

learned. SS means sum of squares, MS means_, 

and the subscripts and t indicate_,_, 

and 


mean square, between, within, total 


14, PLATE 75. As indicated in the third column, the variance is indi¬ 
cated by a new term called_. 


mean square 
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16 . The iernsimean square is used to indicate estimates of variance 
in the analysis-of-variance technique. You will notice that two 

mean squares have been computed: the mean square_ 

groups, and the mean square_groups. 


between, within 


16 . The analysis-of-variance technique tests the difference between 
these two mean squares. Formula 20 presents the formula for 
the analysis-of-variance test of significance, commonly known as 
the_(symbol) test. 


F 


17 . Formula 20 indicates that in order to determine the value of F you 

must divide the mean square_groups by the mean 

square_groups. 


between, within 


18 , In Plate 75, the F ratio of 6.54 has been determined by applying 
Formula 20. Thus, in this example, F was obtained by dividing 
_by_. 


122.9, 18.8 


19 , We shall return later to determine the significance level of this F 
ratio. Now let us determine how we obtained these mean squares. 
Examine the first column in Plate 75, which presents three_ 


sums of squares 


20 , PLATE 75. The first column reveals that the total sum of squares 

can be broken down into two parts: the sum of squares_ 

groups, and the sum of squares_groups. 


between, within 
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21« The fact that the sum of squares between the groups (55^) plus the 

sum of squares within the groups equals the_sum 

of squares is the basis for the analysis-of “variance technique. 


total 


22. To understand this, let us examine the frequency distributions pre¬ 
sented in Plate 76. The first column presents the frequency dis¬ 
tribution of arithmetic scores for all three groups combined. The 
next three columns present the frequency distribution for Group 
_, Group_, and Group_, respectively. 


Plate 76 

Frequency Distributions 
of Arithmetic Scores for 
Groups A, B, and C 94 

and for All Three 
Combined Q2 


Total of All 
Three Groups 

• (K) 


90 ••• 

03 

0) 

^ 88 

o 

86 •••• 

CQ 

o 84 

• r-l 

o 82 ••• 


Group A Group B 



• •• 


Group C 






s _ 

80 •• • 


u 78 
oj 

76 


74 

72 • • 


A, B, C 

23. PLATE 76. As can be seen, the mean score for the totals distribution 
is 84, and is designated by the symbol_. 
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24, PLATE 76. Notice that the mean swres for the group distributions 
are also indicated at points and X^ . The level of X^ is 

indicated by the horizontal dotted line. It is evident from this plate 
^at the means of the groups_(do/do not) differ from 


do 


25. Recall that it was stated that the sum of squares for the total (55^) 
can be broken down into the sum of squares between groups (S5^) 
and sum of squares within groups This can be illustrated by 

examining one of the scores in Plate 76. For example, take Subject 
K in Group A. Subject K has a raw score of_. 


92 


26. PLATE 76. Subject K’s score of 92 deviates from X^ by_points. 


8 


27* Subject K’s deviation of 8 points from X^ can be divided into two 
portions: 

(1) Subject K*s score deviates from its own Group mean (X^) by 
_ points. 

(2) The mean of Group A (X^) deviates from the total mean (X^) 

by_points. 


4, 4 


28, Thus, the deviation of subject K*s score from the total mean (X^Ms 
exactly equal to the sum of its deviation Jrom its group mean (X^) 
and the deviation of_(symbol) from X^. 
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29, PLATE 76. It ha^been shown that the deviation of any one score 
in Group A from X^. can be divided into two portions: (1) its de¬ 
viation from_(symbol) and (2) the deviation of _(symbol) 

from __(symbol). This is true of the scores in all three groups. 


30. FORMULA 21. By the same token the sum of squared deviations 
{SSf.) of all the scores in the various groups can be divided into two 
portions: (1) the sum of squares within the groups (55^^), and (2) the 
sum of squares_the groups (_ ) (symbol). 


between, (SSjy) 


31. The value of these sums of squares can be computed by using For¬ 
mulas 22 through 24. Formula 22 presents the procedure for cal¬ 
culating the total sum of squares. You will recognize that this 
formula is identical with Formula 12. In our experiment in Plate 
74,Ar, =_. 


21 


32. Compute the total sum of squares (55^) for the data presented in 
Plate 74. Use Formula 22. SS^ =_. 


584 


33. PLATE 74. 55^=584. 

As indicated in Formula 23, you need first to calculate the sum of 
squares within each group separately and then__ them to ob¬ 

tain SS^. 


sum 











34 • When computing the sum of squares within a group, you apply the 
same formula (Formula 22) to the data within each group. For 
instance, in Group A, is computed using only the raw scores in 
Group A and =_. 


8 


35, PLATE 74. Use Formula 22 to compute the within-group sum of 
squares for each group. 55^ =_; SS^ =_; SS^ =_. 


120, 94.9, 123.3 


36. PLATE 74. SS^ =120, SS^ =94.9, SS^ =123.3. 

Formula 23 presents the formula for obtaining the sum of squares 
within groups. For the data in Plate 74, =_. 


338.2 


37. PLATE 74. You now have computed the total sum of squares (SS^) 
and the within-groups sum of squares {SS^). Formula 24 presents 
two methods of computing the sum of squares between groups (SS^). 
Formula 24a is the easy way because, using it, you merely sub¬ 
tract _(symbol) from_(symbol). 


38. Formula 24b presents the computational formula for SS^. Apply 
this formula to the data in Plate 74. This formula indicates that 

you square the deviation of each group mean from the_ 

mean. Next, for each group, you multiply this squared deviation 

by_of the group. Finally, you sum the products to obtain_ 

(symbol). 


total, N, SS^ 
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89. PLATE 74 , Compute SS^ by Formula 24b. SS^ = 


8(88 - 84)2 + 7^34 _ 83.143)2 + 6(84 - 79.667)2 = 245.8 


40. PLATE 74. 55^ = 584, SS^ = 338.2. 

By Formula 24b, you know that = 245.8. To check the accuracy 
of your calculations , you apply Formula 24a and find that SS^ = _. 


245.8 


41, PLATE 74. You have thus shown that SS^ plus equals 
(symbol). 




42. PLATE 75. This plate shows the three sums of squares you have 
just calculated. Formulas 28 and 29 indicate that, in order to com¬ 
pute the mean squares from the sums of squares, you must first 

determine the_associated with 

each SS, 


degrees of freedom 


43, Formulas 25 through 27 are used in determining the degrees of 
freedom associated with each sum of squares. The same reasoning 
applies with these df calculations as with the df calculations for the 
t test. For example, the degrees of freedom for the total group 
{dfi>^ is equal to the total_- 1. 


N 
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44, FORMULA 26. Since this formula is concerned with the sum of 
squares between several groups, the degrees of freedom between 
groups {df^) is equal to the_of_. 


number, groups 


46, FORMULA 27. Because the total degrees of freedom {df^) must equal 
the sum of df^ and df^ , it follows that df^ can be obtained by sub¬ 
tracting _(symbol) from_(symbol). 


df^ , df^ 


46, Applying Formulas 25 through 27 to the experimental data in Plate 
75, you find that when there are three groups, the df^ =_. 


2 


47, You also find that, since the total N of the combined groups is 21, 
dft- _. 


20 


48, Applying Formula 27, you find that df^ = 


18 


49, PLATE 75. You have now determined the sums of squares and 
the degrees of freedom associated with them. Applying Formulas 
28 and 29, you can easily determine the mean square between groups 
(MS^) and the mean square within groups (MS^^). =_. 


122.9, 18.8 
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60, PLATE 75. Having computed the two mean squares necessary in 
the analysis-of-variance technique, you are now ready to examine 
the relationship of these two mean squares. Recall that the term 
mean square is identical with_. 


variance 


51. You have obtained two estimates of the variance of our population of 
arithmetic scores. One of these is the variance of the scores within 

the groups, indicated in Plate 75 by_(symbol). The other 

estimate of the population variance is the variance obtained between 
the several groups, indicated by_(symbol). 


MS^, MSj, 
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EXERCISES 


1. What are the two components into which the total sums of squares 
are divided in the analysis-of-variance technique? 

2, Thirty sixth-grade students were divided into three socioeconomic 
groups and were administered a reading comprehension test. The 
researcher had the hypothesis that there is a difference between 
reading comprehension test scores for children of differing socio¬ 
economic backgrounds. The raw scores presented below were ob¬ 
tained to test this hypothesis. Use Formulas 22 through 24 to 
compute the sums of squares needed for the analysis-of-variance 
technique. 


Group A 

Group B 

Group C 

21 

20 

24 

18 

20 

23 

17 

19 

23 

17 

18 

22 

17 

17 

20 

16 

17 

20 

15 

14 

18 

13 

14 

17 

12 

9 

16 


8 7 

7 

3. For the data presented in Exercise 2, determine the degrees of 
freedom associated with each sum of squares, using Formulas 25 
through 27, and compute the necessary mean squares, using 
Formula 28. 

4, What do the symbols SS^ , SS ^, SS^ , and MS represent? 
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set 17 

SIMPLE ANALYSIS OF VARIANCE 
continued 

F TEST FOR TWO VARIANCE ESTIMATES 


The ratio between the mean squares between groups {MS^ and the 
mean squares within groups (MS^) is the basis of the analysis-of- 
variance technique. The evaluation of this F-ratio, using the degrees 
of freedom associated with the mean square estimates, permits the 
researcher either to accept or to reject the null hypothesis that there 
is no difference among the various sample means. Of course, if he 
can accept the null hypothesis, he can state that there is no difference 
among the various sample means. If the analysis-of-variance F test 
permits the researcher to reject the null hypothesis, he then concludes 
that there are significant differences between the various samples. This 
set presents the method for computing and evaluating the F~ratio. Also, 
in this set, there is another method of using the F~ratio to evaluate the 
difference between two sample variances. The difference between one- 
and two-tailed F tests is discussed. 

SPECIFIC OBJECTIVES OF SET 17 

At the conclusion of this set you will be able to: 
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(1) identify the symbol F. 

(2) use Formula 20 to compute the F-ratio in the analysis-of- 
variance technique. 

(3) use Table 3 to evaluate the F-ratio for significance. 

(4) use Formula 31 to determine the F-ratio between two variance 
estimates, and Table 3 to evaluate it for significance. 



1, Recall that the null hypothesis for the experiment in Plate 74 states 
that there is no difference between the arithmetic scores of Groups 
A, B, and C. Your calculations have shown that there is variability 
within the groups as well as_the groups. 


between 


2. If you can demonstrate that the variability between the groups is 
so much greater than the variability within the groups that there 
is a low probability that the between-group variance could have 

been a result of sampling_, then you may_ 

(accept/reject) the null hypothesis. 


error, reject 


3. The question, then, is whether the estimate of the population variance 
obtained between the groups (MS^) is sufficiently greater than the 
estimate of the population variance obtained within the groups {MS^) 
to warrant our rejection of the__. 


null hypothesis 


4, To make a statistical decision regarding the relationship of MS^ and 
MS^ we apply the F test as given in Formula 20, As seen in this 
formula, F is expressed as a ratio of to_(symbol). 


MS^ 


6. Compute the F-ratio of the mean squares obtained in Plate 75. 
F = . 


6.54 
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6, PLATE 75, page 217. F = 6.54. 

As with the t test, you must determine the significance level for 
this JF-ratio. Unlike the t test, which had only one value for the 

degrees of freedom, the F test has_values for df. These 

are df^ and_(symbol). 


two, df^ 


7, To evaluate the significance of an F-ratio you use Table 3, which 
presents the values of F at the .05 and .01 levels of significance. 
This table indicates that the df for the greater MS is listed across 
the top and the df for the smaller MS is listed along the side. Thus, 
you must enter this table using both df^ and_(symbol). 


dfb 


8. TABLE 3. The light type in the body of this table indicates the 
value needed for significance at the .05 level. The boldface type 
gives the value needed for the_level. 


.01 


9. PLATE 75, page 217. In order to evaluate the significance of the 
F-ratio of 6.54, you must enter Table 3 with df^ =_and df^ =_. 


2, 18 


10. PLATE 75, page 217. The larger of the two mean squares in this 

example is_(symbol). The degrees of freedom associated 

with it is 


MSj, 2 
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11, PLATE 75, page 217. Thus, to determine the significance of our 
F-ratio, you enter Table 3 with df - 2 along the top because it is 

associated with the_(greater/lesser) mean square. 

Locate this column in Table 3. 


greater 


12. Likewise, you enter Table 3 with df = 18 along the side of the table 
because it is associated with the ... (greater/lesser) 

mean square. Locate this row in Table 3. 


lesser 


13, TABLE 3. At the intersection of the column ((i/ = 2) and the row 

{df = 18) you find that the F-ratio needed for P = .0500 is__ 

and for P = .0100 is_. 


3.55, 6.01 


14. PLATE 75, page 217. You have determined that, in order to be 
significant at the .05 level, the F-ratio must be at least 3,55. In 
order for it to be significant at the .01 level, it must be at least 6.01. 

Your obtained F-ratio is 6,54. This value __(is/is not) 

greater than is required for significance at the .01 level. 


is 


16, Your F-ratio of 6.54 exceeds that required at the .01 level of sig¬ 
nificance. Therefore, you are justified in_ 

(accepting/rejecting) the null hypothesis that there are no differences 
between the three methods of teaching arithmetic. 


rejecting 
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16. PLATE 75, page 217. The variance between the groups {MS^) has 
been shown to be sufficiently greater than the variance within the 
groups Thus, you can confidently say that the three groups 

of scores did not come from the same_of 

scores. 


population 


17. NOTE: You now know that there is a difference between the three 
methods of teaching arithmetic. However, you do not know whether 
there is a significant difference between all three methods, or 
whether it is just between one method and the other two. The ad¬ 
vanced statistical techniques for making these determinations are 
beyond the scope of this book. 


go on to next frame 


18, In using the analysis-of-variance technique, there is a computa¬ 
tionally easier method for obtaining the sums of squares. This 
method and its format are presented in Formula 30. As shown 
in the boxes, this method requires that first you obtain, for each 

group and for the total, the values of_(symbol),_(symbol), 

and_(symbol). 


ZX, N 


19. FORMULA 30. Substituting these values in the algebraic formulas, 

you first compute, as shown in Step 1, a_term 

to be used in Steps 2 and 3, 


correction 


20, FORMULA 30. Step 2 gives the formula for computing the total sum 
of squares (5S^). Step 3 gives the formula for computing the sum 
of squares between the groups (SS ^). Step 4 indicates that to obtain 

the sum of squares within the groups (5S^) you subtract_ 

(symbol) from_(symbol). 


SSj, ss, 
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21. Using these formulas, calculate the sums of squares for the data 
presented in Plate 74. The sums of squares calculated by using 
Formula 30 should be identical with those previously compute^d for 
Plate 75. 


Plate 77 

Computation of Sum of Squares for Arithmetic Scores Presented in Plate 74 


Group A 

Group B 

Group C 

94 

90 

86 

92 

86 

84 

90 

84 

80 

90 

82 

78 

86 

82 

78 

86 

80 

72 

84 

78 



82 





Total: 

'Z,Xa= 704 

llXh = 582 

478 

= 1764 

62072 

'LX^ = 48484 

= 38204 

= 148760 

li 

00 

00 

Xh = 83.143 

Xc = 79.667 

o 

00 

11 

11 

00 

11 

Nc ^ 6 

Nt = 21 


Step 1 

c 

= ^ = 148176 

Step 2 

ss, 

= 148760 - 148176 = 584 

Step 3 

SSj 

, .JlM . 148176 = 245.8 

Step 4 

ss^ 

= 584 - 245.8 = 338.2 
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22. NOTE: This is the simplest form of the analysis-of-variance tech¬ 
nique. It can be applied to any number of groups simply by extending 
the formulas to include the additional groups. Also, the analysts-of- 
variance technique can be applied to experiments involving groups 
within groups. However, these advanced techniques are beyond 
the scope of this book. 


go on to next frame 


23, Formula 20 indicates that in computing the F-ratio, you always use 

_(symbol) as the numerator and_(symbol) as the 

denominator. 


MS 


w 


24. Therefore, this is a test of whether one variance estimate (MS^) is 
sufficientlythan the other variance estimate In other 

words, you are making a_^-tailed test. 


one 


25, Thus, in the analysis-of-variance technique, you are always making 
a one-tailed test. Since you enter Table 3 directly to evaluate the 
F-ratio in the analysis of variance, it must be a table of significant 
values for -tailed tests. 


one 


26, There is a simpler type of F-ratio you can compute when you wish 
only to test the difference between two population variances. Formula 
31 indicates that when you have two variance estimates you can ob¬ 
tain the F-ratio by dividing the_variance estimate 

by the_variance estimate. 


larger, smaller 
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Plate 78 

Research hypothesis: There is a difference in the variability of reading 
achievement scores between boys and girls. 

Boy sample =15 df - 16 

Girl sample =34 df = 12 


27, Plate 78 presents the variance estimates and dfiov samples of boys 
and girls on reading achievement scores. The variance estimate 

obtained in Sample A is_ ' (larger/smaller) than that 

obtained in Sample B. 


smaller 


28 • PLATE 78. The null hypothesis being tested by the F test here is 
that there is no difference between the two population_ 


variances 


29, Formula 31 indicates that, to obtain the F-ratio, you should divide 

s^ of the_(boys/girls) by the of the_(boys/ 

girls). For Plate 78, F =_. 


girls, boys, 2.27 


30. For Plate 78, F =2.27. 

The research hypothesis for these data requires that you make a 

_^-tailed test. However, you have learned that Table 3 presents 

significant F-ratios for_-tailed tests. 


two, one 
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31. To determine significant levels for two-tailed F tests, you must 
double the P values in this table. Thus, for two-tailed tests, values 

in Roman type are for P = _, and those in boldface type 

are for P == 


.1000, .0200 


32 , PLATE 78. The df for the larger variance estimate is df =_and 

for the smaller variance estimate is df =_. Locate the F-ratios 

associated with these two df's in Table 3. 


12, 16 


33. Table 3 indicates that, for these degrees of freedom, an F-ratio of 

_is required for the .10 level of significance, and an F-ratio 

of_is required at the .02 level. 


2.42, 3.55 


34. PLATE 78. Your obtained F-ratio of 2.27_(does/ 

does not) reach the .10 level of significance. 


does not 


35. Since the F-ratio of the variances in Plate 78 does not reach the .10 

level of significance, you are justified in_ 

(accepting/rejecting) the null hypothesis. 


accepting 


36. The important point to remember is that Table 3 presents, for one- 

tailed tests, F-ratios which are significant at P =_and 

P =_. For two-tailed tests, the F-ratios are significant 

at P = and P = 


.0500, .0100, .1000, .0200 
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EXERCISES 


1. The following statistics were obtained in Exercise 2 of Set 16. Use 
Formula 20 to compute the F-ratio for these data. 


Between groups 


= 180 

11 

to 


Within groups 

SS 

w 

= 431 

u 

dfu,-2l 


Total 

SS 

t 

= 611 

=29 



2. Evaluate the significance of the F test computed in Exercise 1, using 
Table 3. Use P = .0100 as your acceptable level for significance. Do 
you reject the null hypothesis on the basis of the F test? 

3. Below are the variance estimates of manual skills scores obtained 
from two samples of fourth-grade children. The research hypothesis 
is that there is a difference in the variability of manual skills scores 
between the populations from which the samples were selected. Use 
Formula 31 to compute the F~ratio for these data. Use Table 3 to 
evaluate the significance of the F-ratio. Use P == .0100 as the ac¬ 
ceptable level for significance. Do you accept or reject the null 
hypothesis ? 

Group 1 iV = 25 = 215 

Group 2 iV = 41 = 72 

4. What does the symbol F represent? 


937 



SCATTER DIAGRAMS 


The statistical techniques presented up to this point have been con¬ 
cerned with describing frequency distributions or measuring differences 
between sets of data in which raw scores were obtained for only one 
variable. Thus, each individual was measured on only one characteristic 
and statistical inferences were made regarding the distribution of this 
characteristic in the population. 

Another very useful statistical technique available to the researcher 
allows hini' to measure the relationship between two sets of data obtained 
from the same sample, or data from two samples which have been 
^matched” on some basis. In such instances, the researcher obtains 
two frequency distributions of data and determines the degree to which 
the two distributions are related. The statistical term used to describe 
this relationship is correlation . Correlation is defined in this set and 
its different degrees are discussed. The distinction between positive 
and negative correlations will be presented, along with a method for 
depicting the relationship of two frequency distributions graphically by 
plotting the raw scores in a scatter diagram. 



SPECIFIC OBJECTIVES OF SET 18 


At the conclusion of this set you will be able to: 

(1) depict the relationship between two sets of scores by drawing a 
scatter diagram. 

(2) describe what is meant by a positive correlation. 

(3) describe what is meant by a negative correlation. 

(4) describe what is meant by a perfect and a moderate correlation. 

(5) describe what is meant by a zero correlation. 



1, If you have obtained a set of spelling scores, it may be said that you 
have data on the variable of spelling scores. If you have obtained 
the mental ages of a group of children, you have data on the 
_of mental age. 


variable 


2. ^^Arithmetic test scores" is called a variable because, among a group 
of students, these scores will_. 


vary 


3. Up to this point, you have been concerned with data for only one 
variable. If you obtain arithmetic test scores and achievement test 

scores for each member of a group, you have data on_ 

variables. 


two 


4. When you have data for two variables for a group of people, you may 
wish to know if there is a relationship between the two variables. 

For example, you may wish to determine if there is a_ 

between arithmetic test scores and achievement test scores. 


relationship 


6. If there is a relationship between two variables, it may be said that 
the variables are associated. Thus, if the members of the group who 
get high arithmetic test scores also get high achievement test scores, 
this is an indication that these two variables are _. 


associated 
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Plate 79 



Arithmetic Test 

A chievement Test 


Scores 

Scores 

Child 

(^) 

{Y) 

A 

70 

50 

B 

65 

40 

C 

60 

30 

D 

55 

20 

E 

50 

10 


e. 


Scatter Diagram 


Achievement 
Test Scores 
(Y) 



Arithmetic Test Scores 
(X) 


This plate presents the arithmetic test scores and the achievement 
test scores for five children. For child A, the arithmetic test score 
is 70 and the achievement test score is 


50 


7. PLATE 79. Notice that the arithmetic test scores for the five 
children are listed in order of magnitude, with child A being the 
highest with 70, and child E being the lowest with_ 


50 
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8 . PLATE 79. Notice also that the achievement test scores for the 
group are in the same order as the arithmetic test scores, with 

child A having 50, the highest score, and child E having_, the 

lowest score. 


10 


9. PLATE 79. On the variable of arithmetic test scores, there is a 
difference of 5 points between each of the scores. On the variable 
of achievement test scores, there is a difference of_points be¬ 

tween each of the scores. 


10 


10. PLATE 79. In this plate, there are two test scores for each child — 

an_test score and an_ 

test score. 


arithmetic, achievement 


11. PLATE 79. To get a picture of the relationship, or association, 
between these two variables, the data may be depicted graphically 
as is done at the bottom of the plate. Notice that the achievement 
test scores are listed on the vertical axis of the graph, and the 

arithmetic test scores are listed on the_ 

axis. 


horizontal 


12. PLATE 79. Notice that there are five dots in this graph. Each dot 
represents two scores for a child—an achievement test score and 
an test score. 


arithmetic 










13. PLATE 79. The location of each dot in the diagram designates two 
test scores for each child. Consider child D. His arithmetic test 
score is_and his achievement test score is_. 


55, 20 


14. PLATE 79. It is general practice to designate all scores on the 
vertical axis as Y scores. Thus, in this plate, Y represents an 
__(achievement/arithmetic) score. 


achievement 


16. PLATE 79. It is also general practice to designate all scores on 
the horizontal axis as X scores. Thus, in this plate, X represents 
an_ (arithmetic/achievement) score. 


arithmetic 


16. PLATE 79. Child D: X = 55, Y = 


20 


17. PLATE 79. Child D; X = 55, Y = 20. 

The dotted lines in this diagram locate the position of child D. The 
achievement score (Y) is located directly across from score value 
20 on the vertical axis. The arithmetic score (X) is located directly 
above score value_on the_axis. 


55, horizontal 

.. . .___ -r—- L - ' 

18. PLATE 79, The one dot located at the intersection of the dotted 
lines represents both scores for child_. 


D 
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19 . PLATE 79. The dot representing child A is located directly across 

from score value_on the Y axis, and directly above score value 

_on the X axis. 


50, 70 


20 . PLATE 79. Notice that the five dots in this plate represent the 
arithmetic and achievement scores for each of the five children. 
Thus, one dot represents_scores for each child. 


two 


21 . PLATE 79, The diagram in this plate is called a scatter diagram 

because it indicates how the scores of the children_ 

when they are plotted in this manner. 


scatter 


22 . PLATE 79. The scores on variable X are perfectly related, or 
associated, with the scores on variable Y in this plate, because, 
as X increases, Y also_. 


increases 


23 . PLATE 79. When each increase in X scores is matched by an in¬ 
crease in Y scores, the two variables are said to be perfectly 


related 
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24 . The statistical term for the amount of association between two 
variables is correlation. Thus,it can be said that there is a perfect 
__between X and Y scores in Plate 79. 


correlation 


26 . Two variables are said to be correlated when the scores on the two 
variables are associated. Thus, in Plate 79, there is a correlation 

between the children’s arithmetic test scores and their _ 

test scores. 


achievement 


26 . PLATE 79. Notice that the dots in the scatter diagram fall in a 
straight line, from the lower left corner of the diagram to the upper 
corner. 


right 


27 . PLATE 79. There is a positive correlation between the X and Y 
variables in this plate, because with each increase in X scores, 
there is a corresponding increase in_scores. 


Y 


28 . When the X scores of a scatter diagram are directly associated with 

the Y scores, it is called a_(positive/negative) 

correlation. 


positive 
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Plate 80 



29. PLATE 80b. This plate presents another scatter diagram. The dots 

in this diagram run from the upper left corner to the_ 

_ corher. 


lower right 
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30. PLATE 80b. In this diagram, the X scores are _ 

(directly/inversely) associated with the Y scores. 


inversely 


31. PLATE 80b. The X scores are inversely related to the Y scores 
because as each X score increases, each Y score ___. 


decreases 


32. When the X and Y scores are directly related as in Plate 80a, the 
correlation is positive. When the X and Y scores are inversely 
related, as in Plate 80b, the correlation is_. 


negative 


33. When the dots fall on the diagonal of a scatter diagram, the correla¬ 
tion is perfect. Thus, in Plate 80a, there is a perfect positive cor¬ 
relation. In Plate 80b, there is a perfect _ 

correlation. 


negative 


34. A perfect positive correlation exists whenever there is a __ 

(direct/inverse) perfect association between the X scores and the Y 
scores. 


direct 


36. A perfect negative correlation exists whenever there is a(n) 

_(direct/inverse) perfect association between the X 

scores and the Y scores. 


inverse 
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86. Plate 80a is a scatter diagram of a perfect 
correlation. 


positive 


37. Plate 80b is a scatter diagram of a perfect 
correlation. 


negative 


38. Although correlations may be positive or negative, they generally 

are not perfect. Plate 80c presents a moderate_ 

correlation. 


positive 


39. PLATE 80c. (The values of X,and Y are immaterial. Only the 
placement of the dots is important.) In this plate, every increase 

in X scores_(is/is not) matched by a corresponding 

increase in Y scores. 


is not 


40. PLATE 80c. The general trend of the dots in this scatter diagram 
is from the lower left corner to the upper right corner; so the cor¬ 
relation, although not perfect, is a_. (positive/ 

negative) one. 


positive 


41. PLATE 80c. This scatter diagram presents a moderate positive 
correlation because the data generally lie in a positive direction, 

although they . _(do/do not) all lie on the diagonal of the 

diagram. 


do not 
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42. PLATE 80d. The general trend of the data in this scatter diagram 
is in a_(positive/negative) direction. 


negative 


43. PLATE 80d. This scatter diagram presents a moderate negative 
correlation because, although the dots lie in a generally negative 
direction, they do not all lie on the __of the diagram. 


diagonal 


44. PLATE 80e. This scatter diagram presents a zero correlation 
because there is no trend among the dots. In this diagram the scores 

on the X variable_(are/are not) associated with the 

scores on the Y variable. 


are not 


46. PLATE 80e. A zero correlation indicates that there is _re¬ 

lationship between the scores on the X variable and the scores on 
the Y variable. 


no 


46. The scatter diagrams in Plate 80 indicate that correlations may 
range from perfect positive to perfect_correlations. 


negative 
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EXERCISES 


1. Below are the final examination scores in algebra and English for 
eight students. Draw a scatter diagram of these scores. Let the 
algebra scores be the X variable. 


Student 

Algebra Scores 

English Scores 

A 

81 

93 

B 

84 

97 

C 

86 

98 

D 

82 

94 

E 

85 

96 

F 

82 

95 

G 

83 

94 

H 

84 

95 


2. What is meant by a perfect positive correlation between two vari¬ 
ables? A moderate positive correlation? 

3. What is meant by a perfect negative correlation between two vari¬ 
ables? A moderate negative correlation? 

4. Does the scatter diagram you prepared in Exercise 1 indicate a 
positive or a negative correlation? Is it a perfect correlation? 

5. What is meant by a zero correlation ? 
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set 19 

PEARSON PRODOCT-MOMENT CORRELATION 


To express the relationship between two variables statistically, we 
must have some numerical index of the degree of correlation. This index 
is termed correlation coefficient, and its magnitude indicates the degree 
to which two frequency distributions of data are related. 

There are many different types of correlation analysis available to the 
researcher. Which one is appropriate for his use depends upon the nature 
of the data he is analyzing. This set presents a method for computing 
one type of correlation coeffficient—the Pearson product-moment corre¬ 
lation coefficient. This is one of the more common correlation techniques. 
In order to use it properly, however, the researcher must assume that 
the variables are linearly related and the scores on each variable are 
normally distributed. If these assumptions cannot be made, this type of 
correlation analysis is inappropriate and other techniques must be used. 
Set 22 of this book describes a correlation technique which can be used 
for data not meeting the above assumptions. 

This set presents the method for evaluating the significance of the 
Pearson product-moment correlation coefficient by using Table 4, and 
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the basis for determining whether to accept or reject the null hypothesis. 
We will also discuss one- and two-tailed tests of significance for cor¬ 
relation coefficients. 


SPECIFIC OBJECTIVES OF SET 19 

At the conclusion of this set you will be able to: 

(1) state the coefficient which denotes a perfect positive correlation. 

(2) state the coefficient which denotes a perfect negative correlation. 

(3) use Formula 32 to calculate a Pearson product-moment corre¬ 
lation coefficient when given data on two variables for the same 
sample. 

(4) state the null hypothesis regarding correlated variables. 

(5) use Table 4 to determine the level of significance of a Pearson 
product-moment correlation coefficient. 

(6) make a research hypothesis which requires you to use a one- 
tailed test of significance. 

(7) use Table 4 for determining significance levels for one- and 
for two-tailed tests of significance. 


9^9 



1, In statistics, the degree of correlation between two variables is 
indicated by a correlation coefficient. The most commonly used 
correlation coefficient is called the Pearson product-moment 
correlation 


coefficient 


2. The symbol for the Pearson product-moment correlation coefficient 

is r. This coefficient indicates the degree of_ 

between two variables. 


correlation 


3. PLATE 80a, page 246. As indicated in this plate, the Pearson 
product-moment coefficient that indicates a perfect positive cor¬ 
relation is r = . 


1,00 


4, PLATE 80b, page 246. A perfect negative correlation has the value 
of r = . 


- 1.00 


6. When no correlation exists between two variables, r = 


.00 


6, PLATE 80, page 246. As indicated by the scatter diagrams, the 
possible range of correlation coefficients is from -1.00 to_^_. 


1.00 


7, PLATE 80c, page 246. The correlation coefficient for this scatter 
diagram is r =_. This indicates a moderate positive correlation. 


.65 
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8, PLATE 80d, page 246. The correlation coefficient for this scatter 

diagram is r = -.62. This indicates a moderate_ 

correlation. 


negative 


9, All correlation coefficients that lie between 1,00 and .00 indicate a 
positive relationship. Correlation coefficients that lie between .00 
and -1.00 indicate a_relationship. 


negative 


10, A correlation coefficient of r = .70 indicates a_ 

(smaller/greater) degree of association than r = .50. 


greater 


11. Correlation coefficients of r = .80 and r = -.80 indicate the same 
degree of association: r = .80 indicates a positive association, 
whereas r = -.80 indicates a association. 


negative 


12 • Correlation coefficients of r = - .20 andr = .20 indicate 
(differing/the same) degree of association. 


the same 


13, Correlation coefficients range from_to 


-1.00, 1.00 


14. FORMULA 32. This formula is to be used for the calculation of the 
Pearson_ - correlation coefficient. 


product-moment 
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15. FORMULA 32. The formula is used to determine the correlation 
coefficient between two variables. The symbol for a raw score on 
one variable is X. The symbol for a raw score on the other variable 
is_(symbol). 


Y 


16, FORMULA 32. The expression 2XY indicates that you multiply the 

X score times the Y score for each individual, and then__ all 

the products. 


sum 


17. FORMULA 32. The expression NTXY indicates that you multiply 
the sum of the products by_(symbol). 


N 


18. FORMULA 32. The expression (SX)(2Y) indicates that you sum the 

X scores, and sum the Y scores, and then___these 

two sums. 


multiply 


19. FORMULA 32. The expression IVSX^ indicates that you_the 

squares of the X scores and multiply by_(symbol). 


sum, N 


20, FORMULA 32. The expression (ZX)^ indicates that you sum the X 
scores and then_the sum. 


square 


21. The expression 2X^ and (ZX)^ differ in that ZX^ indicates that you 

square the scores first and then_them, while (ZX)^ indicates 

that you sum the scores first and__ the sum. 


sum, square 
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22. PLATE 80a. This is a scatter diagram of a perfect positive corre¬ 
lation. The scores plotted here are presented in Plate 81. To com¬ 
pute the value of r using Formula 32, you must determine six values: 
N, SX, 2Y, 2X^,_, and_. 


2Y^ SXY 


Plate 81 


Child 

{CoL 1) 

X 

{Col. 2) 

Y 

(Col. 3) 

(CoZ. 4) 

y2 

(Col. 5) 

XY 

A 

1 

5 

1 

25 

5 

B 

2 

10 

4 

100 

20 

c 

4 

20 

16 

400 

80 

D 

5 

25 

25 

625 

125 

E 

6 

30 

36 

900 

180 

F 

7 

35 

49 

1225 

245 

= 6 

SX = 25 SY 

= 125 2X2 

= 131 2Y2 

= 3275 

SXY= 655 


JV2XY - (2X) (SY) 

(Formula 32) r = rz -;;--- 

VOVSX^ - (2X)*j[ArSY* - (2Y)2 j 

6(655) - (25) (125) 
y [6(131) - (25)^] [6(3275) - (125)^] 

= 805 ^ 805 _ 805 ^ 

7(161) (4025) 7648,02 5 805 


23. In this example, = __. Small numbers of scores are used for 

illustration only. Of course, the same procedures apply for any 
number of scores. 



6 
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24. PLATE 81. Each X score is presented in column 1. Each Y score 
is presented in column 2. The square of each X score is presented 
in column 3. The square of each Y score is presented in column_. 


4 


25. PLATE 81. The product of each X score with its Y score is repre¬ 
sented by the symbol XY and is presented in column 


5 


26, PLATE 81, The sum of each of the columns in the plate gives you 
the values needed for substitution in Formula 32. In this example: 

N=6, XX =25, XY = 125, XX^ =_, SY* =__, and 

SXY = 


131, 3275, 655 


27, PLATE 81. The numerator of Formula 32 is JVSXY - (2X)(2Y), which 
indicates that you multiply N times EXY and subtract the product 
of_(symbol) and_(symbol). 


(EX), (SY) 


28, PLATE 81. The denominator of Formula 32 indicates that you do 
the necessary computation within the brackets and then multiply the 

two figures. Then you extract the_of the 

product. 


square root 
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29. PLATE 81. Check the substitution of the numerical values for the 
symbols in Formula 32. Also, check the arithmetic. In this example, 
r = 1.00, which indicates a perfect_correlation. 


positive 


Plate 82 


Person X Y 

A 4 9 

B 14 

C 3 1 

D 6 7 

E 5 3 

F 4 2 


30, This plate presents data for six persons, each having an X score 
and a Y score. Prepare a scatter diagram for these data. 


Plate 83 

9 

8 

7 
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31. PLATE 83. This scatter diagram of the data in Plate 82, although 
far from representing a perfect correlation, indicates a slight 
__(positive/negative) relationship. 


positive 


32. PLATE 83. The value of r ranges from -1.00 to 1.00. If there is a 
slight positive correlation between X and Y in this plate, the value 
of r must lie somewhere between the values of __and_. 


.00, 1.00 


33. PLATE 82. In order to compute the value of r using the column 
headings of X, Y, X^, Y^, and XY (as in Plate 81), do the necessary 

arithmetic to determine the following: N =_, SX =_, SY _, 

2X^ ==_, 2Y^ =_, and 2XY =_. 


6, 23, 26, 103, 160, 108 
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84. PLATE 82. AT = 6, 2X=23, 2Y=26, SX^ = 103, 2Y^ = 160, 

2XY = 108. Substitute these values for the symbols in Formula 32 
and do the necessary calculations to determine the value of r. 
r = 


r = .31 

(see below to check your computation) 


Plate 84 


Child 

X 

Y 


y2 

XY 

A 

4 

9 

16 

81 

36 

B 

1 

4 

1 

16 

4 

C 

3 

1 

9 

1 

3 

D 

6 

7 

36 

49 

42 

E 

5 

3 

25 

9 

15 

F 

4 

2 

16 

4 

8 

iV = 6 

2X = 23 

2Y = 26 2.X^ 

= 103 

SY^= 160 

2XY= 108 


NZXY- (2X) (SY) 
y/[NXX^ - (2X)^] [aTSY^ - (2Y)^] 

_ 6(108) - (23) (26) _ 

y[6(103) - (23)^] [6(160) - (26)^] 

50 _ 50 _ 50 

^(89) (284) ^25276 158.98 


36. PLATE 84. r = .31. 

This indicates that there is a slight positive correlation between the 
X and Y variables for the six people, If you had obtained a value of 

r = .60,this would have indicated a_(greater/lesser) 

correlation between the two variables. 


greater 
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36. As,indicated in the two examples that have been presented, when, 
in the numerator of Formula 32, NZXY is greater than (2X)(2Y), 
the correlation is positive. When JNTSXY is less than (2X)(2:Y), the 
correlation must be 


negative 


37. FORMULA 32. When the numerator term, NI.XY - (2X)(2Y),= 0, 
then r must equal_. 


.00 


38. In testing hypotheses regarding correlations, the null hypothesis 

states, ’’There is_correlation between variable X and variable 

Y.” 


no 


39. PLATE 84. The correlation between variables X and Y is r = .31. 
You need to determine if this value is sufficiently large to permit 
you to reject the_hypothesis. 


null 


40. PLATE 84. To evaluate the significance of r, use Table 4. This 
table presents the values of r that are required for P = .1000, 

P = .0500, P - __, and P =_for differing degrees 

of freedom (df ). 


.0200, .0100 


41. To determine the value of r that is significant for the differing P 
values, you must first determine the df for the data. Formula 32 
indicates that the df to be used in evaluating r is df =_. 


N - 2 
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42. FORMULA 32. To evaluate the significance of r, the df - N - 2, In 
this case^ JV is the number of individuals involved in the correlation. 
Thus, N is the number of pairs of scores. In Plate 84, df = _. 


43. PLATE 84. r = .31, df = 4:. 

From Table 4, for df = 4, the value of r necessary for significance 
at the .05 level is 


.81 


44. PLATE 84. r = .31, 4f = 4. 

From Table 4, significance at the .05 level would require an r of .81. 

Therefore, because the obtained r is only .31, you_ 

(accept/reject) the null hypothesis that there is no correlation be¬ 
tween the X and Y variables. 


accept 


46. PLATE 84, r = .31, df - A, From Table 4; for P = .0500, r = .81. 
You would accept the null hypothesis that there is no correlation 
between the two variables. This means that you conclude that the 
obtained r of .31 was due to error. 


sampling 


46. PLATE 84. r = 31, t?/=4. 

Table 4. If the df for these data had been 50, then the required r for 
P = ,0500 would be 


.273 
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47. Consider: r = .31, df=50. 

From Table 4, the r required for significance at the .05 level is 
.273. Therefore, an obtained r of ,31 would be considered 
_(significant/non-significant). 


significant 


48, Consider: r = .31, df= 50. 

An obtained r of .31 is significant beyond the .05 level because this 

r is_ (larger/smaller) than the required value of r 

at the .05 level as presented in Table 4. 


larger 


49, Consider: r = .31, <^/ = 50. 

Because r is beyond the .05 level of significance, it may be said 
that, if you reject the null hypothesis, the probability of a Type I 
error is_(less/greater) than P = .0500. 


less 


60, If, for a particular df, the obtained r exceeds the t^^bled value of 
r, at P = .0500, the correlation is said to be significant beyond the 
.05 level. If the obtained r exceeds the tabled value of r, at P = .0100, 
the correlation is said to be significant beyond the_level. 


.01 


61, Table 4 is for two-tailed significance tests. Therefore, the values 
in this table apply to both positive and_correlations. 


negative 
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52 « If you make the research hypothesis that your correlation will be 
positive, you have made a_(one/two) tailed hypothesis. 


one 


63, If, in your research hypothesis, you do not state whether the cor- 

relation is positive or negative, you should use a_(one/two) 

tailed test of significance. 


two 


64. If you make a one-tailed hypothesis, you may make a one-tailed test 
of significance. This is done in exactly the same manner as with 
the t test, 

TABLE 4. For a one-tailed test, the values of r that will be signifi¬ 
cant at the .05 level are listed in the column headed P = 


.1000 


66. TABLE 4. For a one-tailed test, the r's at the P = .0100 level are 
listed in the column headed P = 


.0200 


264 






EXERCISES 

1. What magnitude must a Pearson product-moment correlation coef¬ 
ficient have to denote a perfect positive correlation? A perfect 
negative correlation? 

2. For the following set of data, the researcher has made this hypoth¬ 
esis: "There is a relationship between the arithmetic scores and 
the English scores." Use Formula 32 to compute the Pearson 
product-moment correlation coefficient for these data. 


Student 

Arithmetic Scores 

English Scores 

A 

20 

18 

B 

18 

22 

C 

17 

15 

D 

16 

17 

E 

14 

8 

F 

14 

20 

G 

12 

9 

H 

9 

7 


3. Does the hypothesis stated in Exercise 2 require a one-tailed or a 
two-tailed test of significance? Using Table 4, determine the magni¬ 
tude of the r required for significance at the .05 level. At the .01 
level. 

4. State the null hypothesis being tested in Exercise 2. Using P = .0500 
as your acceptable level for significance, would you accept or reject 
the null hypothesis? 

5. For the following set of data, the researcher has made this hypoth¬ 
esis: "There is a positive relationship between the spelling scores 
and the English scores." Compute the Pearson product-moment 
correlation coefficient for these data, using Formula 32. 


Student 

Spelling 

Scores 

English 

Scores 

Student 

Spelling 

Scores 

English 

Scores 

A 

32 

20 

G 

25 

10 

B 

29 

17 

H 

25 

8 

C 

28 

17 

I 

21 

9 

D . 

27 

18 

J 

20 

6 

E 

27 

17 

K 

15 

8 

F 

27 

12 
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Does the hypothesis stated in Exercise 5 require a one-tailed or a 
two-tailed test of significance? Using Table 4, determine the magni¬ 
tude of the r required for significance at the .05 level. At the .01 
level. 

State the null hypothesis being tested in Exercise 5. If you desig¬ 
nate P = .0100 as your level of significance, would you accept or 
reject the null hypothesis? 


Identify the symbol r. 



set 20 

REGRESSION COEFFICIENT 


If arithmetic scores and intelligence test scores are correlated, it is 
possible to predict the value of a personas arithmetic score on the basis 
of his intelligence test score. Likewise, if a personas arithmetic score 
is known, it is possible to predict his intelligence test score. 

This set demonstrates the use of the regression coefficient in making 
such predictions, and introduces and explains the new terms regression 
line, best fit line, and regression equation. A method is given for plotting, 
regression lines for both variables, using the regression equations, and 
there is a discussion of the relationship between regression lines in cases 
where the two variables are not perfectly correlated, 

SPECIFIC OBJECTIVES OF SET 20 

At the conclusion of this set you will be able to: 

(1) state what is meant by regression or best fit line. 



(2) use Formula 33 to calculate a regression equation for the re¬ 
gression of Y on X. 

(3) use Formula 34 to calculate a regression equation for the re¬ 
gression of X on Y, 

/ 

(4) plot the regression lines in a scatter diagram. 

(5) predict, using the regression equation, a score on one variable 
that is associated with a given score on the other variable. 

r%j /so 

(6) identify the symbols F, byx, X, bxy. 



1, If two variables are correlated, you can predict the value of the 
score on one variable for an individual if you know his score on the 
other variable. Thus, if you know one child’s achievement test 

score, it is possible for you to_his arithmetic test 

score. 


predict 

- ---. ■ ■ -- . -- ■ .1 ■ ■ 

2, When you have two correlated variables, it is possible to predict 
the Y score of an individual on the basis of his X score. If height 
and age are correlated, you may predict an individual’s height if 
you know his_. 


age 


Plate 85 

Scoffer Diagram wifh fhe Regression Line of Y on X 


3. 


(Y) 

Test 

Scores 



Chronological Age 


This plate presents the scatter diagram of test scores for eight 
children at various ages. For example: the child who is one year 
old has a test score of 2; the child who is six years old has a test 
score of 


4 
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4. PLATE 85. The trend of the data in this plate indicates that there 

is a_(positive/negative) correlation between these 

two variables. 


positive 


6. PLATE 85. The line that is shown in this plate is called a regression 
line. This particular line shows the regression of variable Y (test 
scores) on variable X (_). 


chronological age 


6. PLATE 85, Although the dots representing the test scores at the 
various ages do not lie exactly on this line, it is the one straight 
line that best fits the data. Therefore, this line is called the 
regression line of Y on X, or the_ line. 


best fit 


7. PLATE 85. This best fit, or regression, line can be used when you 
wish to estimate the value of a test score (Y) for a child at a partic¬ 
ular chronological age (_) (s 5 nnbol). 


X 




8 . PLATE 85. Although you do not have a nine-year-old child in your 
sample, it is possible to predict the score value for that age child 
by using the best fit or _line. 


regression 


9. PLATE 85. Thus, using the regression line, the best estimate of 
the test score for a nine-year-old child is_. (See dotted lines.) 
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10. PLATE 85. The six-year-old child in the sample had a test score 

of_. Hpwever, based on the regression line, the best estimate of 

the test score for all six-year-old children is_. 


4, 5.5 


11, Although the regression line is drawn using sample data, it en¬ 
ables you to predict the most likely value of test scores for the 
_(sample/population). 


population 


12, Thus, using the regression line, it is possible to determine the best 
estimate of a value on variable Y for any particular value on variable 
_(symbol). 


X 


13, The position of the regression line in a scatter diagram is deter¬ 
mined algebraically by use of Formula 33a. This formula is called 
a_ 


regression equation 

14. FORMULA 33a. The symbol Y is used to indicate the_ 

value of Y that corresponds with any particular score value of X. 


predicted 

15, FORMULA 33b. The symbol hy^ is called the regression coefficient. 
The subscript, yx, in this symbol indicates that this is the regression 
coefficient for variable Y on variable_(symbol). 


X 


16, FORMULA 33b. In this formula for hyx the numerator is divided by 
the sum of__ of the X variable. (Recall Formula 12.) 


squares 
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17 « FORMULA 33c* The symbol a is used to represent 2 i constant to be 
used in Formula 33a. It is obtained by multiplying byx times X and 
subtracting the product from_(symbol). 


Y 


Plate 86 


Computation of the Regression Equation of Y 

on X 




Chronological Test 

Ages Scores 





Individual 

X 

Y 



y2 

XY 

A 

10 

7 

100 


49 

70 

B 

8 

9 

64 


81 

72 

C 

7 

10 

49 


100 

70 

D 

6 

4 

36 


16 

24 

E 

4 

2 

16 


4 

8 

F 

3 

1 

9 


1 

3 

G 

2 

3 

4 


9 

6 

H 

1 

2 

1 


4 

2 

N = 8 SX 

= 41 

2 Y = 38 2.X^ 

= 279 

2Y^ = 

264 2.XY = 

= 255 

Formula 33b 







SXY - 

t> _ 

(2X) (2^) (41) (38) 

N “ 8 

_ 60.25 _ 

.87 

LfJX — - 

2X2 

(2X)2 

N 

279 - 

(41)' 

8 

68,88 


— 41 

X = - = 5.12 

Y 

II 

11 

4.75 


Formula 33c 

a = 

Y - bX = 4.75 ■ 

- (.87) 

(5.12) = 

.30 



18, This plate presents the data plotted in the scatter diagram in Plate 
85. For these data, N =_. 


8 
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19. PLATE 86. To determine the values to be used in Formula 33a for 
the regression equation, first you must determine the value of , 
which is called the_.. _. 


regression coefficient 


20 . 


PLATE 86. To solve for hyx, first you must substitute the numerical 
values for the symbols in Formula 33b. Check the substitution in the 
formula with the values obtained from the columns at the top of the 
plate. In this plate, hyx - _. 


.87 


ji. 


21. PLATE 86. hyx = .87. 

To determine the value of a, use Formula 33c. In order to determine 
a, you must first compute_the mean of e^h variable. Check these 
calculations in Plate 86. X =_ Y - _ 


5.12, 4.75 


22. PLATE 86. byx = ^87, X = 5.12, F = 4.75. 

To solve for a, substitute the numerical values for the symbols in 
Formula 33c. Check these substitutions in Plate 86. a = _ 


.30 
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23. PLATE 86. a = .30, by^ = .87. 

With this information, you may now determine the regression equation 
for this set of data. Substitute in Formula 33a the numerical values 

rsj 

for the symbols a and Y =_+_X. 


.30, .87 


24* PLATE 86. The regression equation for this set of data is 
F = .30 + .87X. In order to make the best prediction of a,score on 

/V 

the Y variable (F) corresponding to a score on the X variable, you 

must substitute the value of X in Formula 33a and solve for_ 

(symbol). 


F 


26. PLATE 86. F = .30 + .87X. 

Suppose you wish to predict the test score value (F) for a five-year- 
old child (i.e., X = 5). Substitute 5 for X in Formula 33a and solve 
for F. F =_. 


F = .30 + (.87)5 = 4.65 




26. F = .30 + .87X. 

PLATE 86. For X = 5, F = 4.65. Thus, using the data in the plate, 
the best estimate or prediction of the test score for a five-year-old 
child is 4.65. Using Formula 33a, determine the best^stimate or 
prediction of the test score for a ten-year-old child. F =_. 


9.00 [.30 + (.87)10] 
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27* The position of the regression line is deternained by calculating two 
values of Y, plotting them on the scatter diagram, and drawing a 
straight line through them. Notice, in Plate 85, that the regression 
line is determined by ^e values you have just determined: for X = 5, 
Y = 4.65; for X = 10, Y = 9.00. This line represents the regression 
of Y on_(symbol). 


X 


28. FORMULA 33. This formula is for the regression of Y on X. It is 
also possible to determine the_of X on Y. 


regression 


29 « Formula 34a presents the formula for determining the regression of 
_(symbol) on_(symbol). 


X, Y 


80« FORMULA 34a. In this formula the symbol X is used to indicate the 

_value of X that corresponds with any particular 

score value of Y. 


predicted 


81. FORMULA 33b. In the subscriptis used to indicate that this 
is the regression coefficient of Y on X. 

FORMULA 34b. In bxy the subscript xy is used to indicate that this 
is the regression coefficient of_(symbol) on_(symbol). 


X, Y 
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32, FORMULA 34b. The regression coefficient bxy for the regression 

of X on Y has as its divisor the sum of __of the Y 

variable. 


squares 


33. In order to determine the regression coefficient of X on Y for the 
data in Plate 86, substitute the numerical values for the symbols 
in Formula 34b and solve for b^y . b^y =_. 


60.25 


264 - 


.72 


34. PLATE 86. b^y ^ .72. 

You have already deteimined the means of the two variables. From 
the plate, these are: X =_, Y =_. 


5.12, 4.75 


36. PLATE 86. b^y =-12, X = 5.12, Y == 4.75. 

FORMULA 34c. Substitute the numerical values for the symbols in 
this formula and solve for a. a = _. 

5.12 - (.72)4.75 = 1.70 


36. PLATE 86. a = 1.70, b^y =.72. 

FORMULA 34a. Substitute the numerical values for the sjmibols in 
this formula and write the regression equation for X on Y for these 
data. 


X = 1.70 + .72Y 
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87. The regression equation for X on Y for the data in Plate 86 is 
X = 1.70 + .72Y. Using this equation you can predict the value of 
_ (symbol) for any particular value of_(symbol). 


X, Y 


38. PLATE 86. X = 1.70 + .72Y. 

For the score value Y = 3 determine the predicted value of X. For 
Y = 3, X= _. 


3.86 


[1.70 + .72(3)] 


39. PLATE 86. X = 1.70 + .72Y. For Y = 3, X = 3.86. 

This means that if an individual receives a score of 3 on variable Y, 
you predict, from the regression equation, that on variable X he will 
receive a score of 


3.86 


40. PLATE 86. X= 1.70 +.72Y. 

For score value Y = 8 determine the value of X. For Y == 8, X = 


7.46 


[1.70 + .72(8)] 


41. Recall that Plate 85 presented the scatter diagram for the data in 
Plate 86. Recall also that the regression line shown in Plate 85 
depicted the regression of_(symbol) on_(symbol). 


Y, X 
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42. PLATE 86. X = 1.70 +.72Y. For Y = 3, X = 3.86; for Y = 8, X = 7.46. 
Li Plate 85 the regression line shown is for Y on X. Using the above 

data, you can also draw the regression line representing_(symbol) 

on (symbol). 


X, Y 


Plate 87 

Scatter Diagram Indicating Regression Lines 



01 23456789 10 


(X) 

Chronological Ages 

43. In Plate 87, which is for the same data as Plate 86, the dotted line 
represents the regression of X on Y. This Une is determined by the 
calculations you have just made: for Y = 3,X = 3.86 (plotted at point 
A on graph); for Y = 8, X = 7.46 (plotted at point_on graph). 


B 
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44 • PLATE 87. The dotted line drawn throu^ points A and B represents 
the regression line of_(symbol) on_(symbol). 


X, Y 

45. PLATE 87. By use of the regression line of X on Y, you can predict 
a child's chronological age if you know his__. 


test score 

46, PLATE 87. By use of the regression line of Y on X, you can predict 
a child's test score if you know his_ _. 


chronological age 


47. PLATE 86. X = 5.12, F = 4.75. 

Iq the scatter diagram presented in Plate 87, locate the point repre¬ 
sented by these two mean scores. 


see Plate 87 

48. PLATE 87. X = 5.12, F = 4.75. 

The point in the scatter diagram located by the two mean scores 
represents the point at which the two regression lines_. 


cross 


49. PLATE 87. The two regression lines cross at the point represented 
by the means of the two variables. Because all of the dots in this 
diagram do not lie in a straight line, the correlation between the 

two variables is___ (perfect/ 

less than perfect). 


less than perfect 
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60. If the correlation between two variables is less than perfect, you can 
draw two regression lines. However, if the correlation is perfect 

(1.00 or -1.00), the two regression lines will be_ 

(the same/different). 

the same 



EXERCISES 


1. What is a regression line? 

2. Using Formula 33b, calculate the regression coefficient for the re - 
gression of Y on X for the following set of data. 


Student 

Inference Test 
Scores (X) 

Social Studies 
Test Scores (Y) 

A 

11 

4 

B 

12 

2 

C 

13 

1 

D 

14 

3 

E 

14 

5 

F 

15 

6 

G 

16 

10 

H 

17 

4 

I 

18 

9 

J 

20 

8 


3. Using Formula 33, predict the social studies test score for a student 
who receives an inference test score of 13. For one who receives 
an inference test score of 19. 

4. Draw a scatter diagram of the data presented in Exercise 2. Using 
the values obtained in Exercise 3, draw in this scatter diagram the 
regression line of Y on X. 

5. Using Formula 34b, calculate the regression coefficient for the re¬ 
gression of X on Y for the data in Exercise 2. 

6. Using Formula 34, predict the inference test score for a student who 
receives a social studies test score of 2. For one who receives a 
social studies test score of 7. 

7. In the scatter diagram drawn for Exercise 4, draw the regression 
line of X on Y. 


rsj 

8. Identify the symbols Y, byx > b^y • 
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set 21 

t-RATIO FOR CORRELATED SAMPLES 


The ^-ratio was discussed in Set 14 as a method of determining the 
significance of the difference between sample means; the method pre¬ 
sented was for testing the difference between what may be termed in¬ 
dependent, or uncorrelated, samples. When making mean score 
comparisons, however, there is sometimes a correlation between the 
two sets of scores being examined. This may occur when the two sets 
of scores are derived from the same sample. For instance, when a sixth- 
grade class is given a spelling test at the beginning and end of the school 
year, the two sets of scores are correlated because they are derived 
from the same sample of children. Correlated sets of scores also occur 
when individuals in two samples are ’’matched” on a third variable, such 
as chronological age. The samples in this case are said to be dependent 
because of this matching process. They are dependent samples because 
each individual in one sample has been matched with an individual in the 
other sample. This set presents a method for taking into account the 
degree of correlation that exists between dependent samples when exam¬ 
ining the significance of the difference between means, using the ^-ratio. 





This set presents two methods for determining the standard error of 
the difference for correlated samples and the calculation and evaluation 
of the ^-ratio. 

SPECIFIC OBJECTIVES OF SET 21 

At the conclusion of this set you will be able to: 

(1) distinguish between dependent and independent sets of scores. 

(2) use Formulas 35 and 36 to calculate a ]f-ratio for data of two 
dependent samples and evaluate it for level of significance. 

(3) calculate the ^-ratio using the direct difference method as 
presented in Formula 37 and evaluate the ^-ratio for level of 
significance. 

(4) identify the symbol D . 



1, You have already been given the formula for the t test for the dif¬ 
ference between two independent sets of scores (see Formula 19). 

The word "independent" means that the sets of scores are_ 

(correlated/not correlated). 


not correlated 


2. When sets of scores on two variables are correlated, they are said 
to be_(dependent/independent). 


dependent 


3, If you obtain arithmetic scores on two different samples of individ¬ 
uals, and if the two samples are randomly selected, the two sets of 
scores are_(dependent/independent). 


independent 


4. 


If you match the individuals in the two samples on some variable, 
say chronological age or mental age, then the individuals in the two 
samples are no longer_(dependent/inde¬ 

pendent) but are related. 


independent 


6, If the individual scores on one variable are matched in some manner 
with the individual scores on the other variable, it is possible to 
compute a_between these two sets of scores. 


correlation 
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6, When two sets of scores are correlated, you must take this corre¬ 
lation into account in performing a t test. Formula 35 presents the 
formula for computing s for__ samples. 


correlated 


?• FORMULA 35. The symbol indicates that you 
the standard error of the mean of Sample 1. 


square 


8. FORMULA 35. The expression s^ ^ indicates that you square the 
standard error of the mean of Sample_. 


2 


9, FORMULA 35. The expression 2rs indicates that you multiply 

2 times r times the standard error W tie mean of Sample 1 times 
the_of the mean of Sample 2. 


standard error 


10. FORMULA 35. This formula indicates that, in order to obtain , 
you must subtract from the sum of and and 

then extract the of the remainder. 


square root 
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Plate 88 

Spelling Scores for Thirty—one Children Tested in January and Again in June 


Jantmry Scores 

June Scores 

1— 1 

CO 

[{ 

N = 31 

11 

00 

0 

X2 = 90 

- 3 

SJ = 4 

2 


r = .53 


11, This plate presents data for thirty-one children who were given an 
arithmetic test in two different months. The correlation between 
these two sets of scores is r = 


.53 


12. PLATE 88. Th^mean score of the January test is Xi = _and 

the June test is X 2 = _. 


80, 90 


13. PLATE 88. The standard error of the means for these two sets of 
scores, as computed using Formula 17, is =_and s =_. 


3, 4 


14. PLATE 88. In order to determine the standard error of the difference 
between these two correlated sets of scores, substitute the numerical 
values for the symbols in Formula 35. 


^diff 


y/ + 


^diff = v/ (3)^ + (4)2 - 2(.53)(3)(4) 


286 







16. PLATE 88. s^^ff = y/(3f + (4)* - 2(.53)(3)(4) 

^diff =--- 


yi2.28 = 3.50 


18. PLATE 88. =3.50. 

To determine if the spelling scores of these children increased 
significantly from January to June, apply the t test, using Formula 
36. Substitute the appropriate values for the symbols in this formula 
and solve for t, t - 


/ 90 - 8q\ 

\ 3.50 / 


17, PLATE 88. t =2.86. 

Forniula 36 indicates that the degrees of freedom for a correlated 
t test is df = number of pairs of scores -_. 


1 


18. PLATE 88. ^=2.86. 

The df for this set of data is 


30 


(31 - 1) 


19. Use Table 2 to determine the values of ^ required for significance 
with df =30. The value of t necessary for significance at the .05 
level is_, and at the .01 level is_ 


2.042, 2.750 


■ . .1-.. • 'i- - is = - 1.- -.Oi-W.... ; ;-C,i v-S-i- ky. ^ , 
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20. PLATE 88. ^=2.86, df= 30. 

FROM TABLE 2: at P = .0500, t = 2.042; at P = .0100, t = 2.750. 
The obtained t of 2.86 is greater than is necessary for significance 
at the .05 level, and also greater than is required at the .01 level. 
You conclude that the difference between the means is significant 
beyond the_level. 


.01 


21. Formula 35 for the estimation of s is useful if you know the 
standard error of the mean and the degree of correlation. A simpler 
method of estimating is presented in Formula 37. This method 
is called the ___method. 


direct difference 


22. In correlated data, you have a number of pairs of scores. In order 

to use Formula 37, you must obtain the_ 

between each pair of scores. 


difference 


23. FORMULA 37, The direct difference method indicates that you take 
the difference between each of the pairs of scores. The symbol that 
represents a difference score is_(symbol). 


D 


24. FORMULA 37. In this formula, N represents the number of 
of scores. 


pairs 
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Plate 89 


First Second 

Testing Testing 

Individual X\ Xi 


D 


D 


2 


A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

iV = 10 


4 

4 
3 

3 

5 
2 

4 
3 
2 

_ 1 

2X1= 31 


2 

2 

1 

4 

4 

1 

3 

3 

2 

3 

2 X 2 = 25 


2 4 

2 4 

2 4 

-1 1 

1 1 

1 1 

1 1 

0 0 

0 0 

-2 4 


SD = 6 = 20 


25, This plate presents an example of the computation of a t test using 
the method. 


direct difference 


26. PLATE 89, In this plate, individual A received a score on the first 
testing of 4, and a score on the second testing of_. 


2 


27. PLATE 89. Individual A has a difference score (JD) from the first 
testing to the second testing of Xj - X 2 = 4 - 2 =_. 


2 


28, PLATE 89. For individual A, D = 2 because his score value de¬ 
creased from the first testing to the second testing. For individual 
J, B =_. 


-2 
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29. PLATE 89. For individual J, D = -2 because his score value 

_(increased/decreased) from the first testing 

to the second testing. 


increased 


SO, PLATE 89. The third column of this plate presents D for each of 

the pairs of scores. The fourth column presents_(symbol) for 

each of the pairs of scores. 


Z)2 


31, PLATE 89. The totals of the columns indicate the values to be used 
in computing . In this example, N =_. 


10 


32. PLATE 89, In order to obtain Sj^yy, first you must calculate 
by use of Formula 37a. Substitute the numerical values for the 
symbols in Formula 37a. 

'lS = SD® - = 

jy - 


20 


16 )! 

10 


16.4 


33. PLATE 89. = 16.4. 

To obtain , substitute the numerical values for the symbols in 
Formula 37b and do the necessary arithmetic. 

^diff - - - 


/ 16.4 

V 


/l.82 = 1.35 
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34. PLATE 89, =1.35. 

Determine the mean for the first and second testing. 

=_ X2 = _ 


Xi = 


n 

10 


3.1, X2 


10 


2.5 


35. PLATE 89. = 1.35, = 3.1, X 2 = 2.5. 

Determine the value of the ^-ratio by use of Formula 36. 


X^ - X 


t = 


1 


Sdiff 


- 


t == 


3.1 - 2.5 
1.35 


= .44 


36. PLATE 89. t = .44, N = 10. 

Formula 36 indicates that the degrees of freedom associated with 
this t test are: 

df = number of_of scores - 1 


pairs 


37. PLATE 89. f = .44, N = 10. df = 


9 


( 10 - 1 ) 


38. PLATE 89. f = .44, <f/=9. 

Use Table 2 to determine the value oft needed for significance with 
df=9. For P = .0500, t= ; for P = .0100, t =_. 



2.262, 3.250 
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39. PLATE 89. ^ = .44, ^f/ = 9. 

From Table 2, the value of the ^-ratio needed for significance at the 
.05 level is 2.262. The obtained f-ratio of .44 is not as large as that 
required for significance; therefore, you conclude that the difference 
between the means for the first testing and the second testing is 
___ (significant/non-significant). 


non-significant 


40, In making a t test, if the two sets of scores are for the same group 
of individuals, or if the individuals in the two groups have been 
paired by some matching technique, the two sets of scores are said 
to be_(correlated/not correlated). 


correlated 


41. In making a t test, if the two sets of scores are for different groups 
of individuals who have not been paired or matched in any manner, 
the two sets of scores are said to be independent of each other and, 

therefore, are_(correlated/not 

correlated). 


not correlated 


42. To make a t test if the two sets of scores are not correlated, use 
Formula 18 for computing . Use either Formula 35 or Formula 
37 when the two sets of scores are 


correlated 







EXERCISES 


1. What is meant by a dependent set of scores? 

2. A group of sixth-grade students were given Form A and Form B of 
an intelligence test. Ilie following statistics were obtained. Com¬ 
pute the f-ratio, using Formulas 35 and 36. 

Form A Form B 

N-=27 N = 27 

X = 44 -X = 40 

SX=2.6 sj=3d 

r = .46 

3. In Exercise 2, assume the researcher did not predict that the scores 
on one form of the test would be higher than on the other form. 
Using Table 2, would you reject the null hypothesis that there was 
no difference between Form A and Form B? Use P = ,0500 as your 
designated level for significance. 

4. For one year, two groups of children were instructed in art by 
different methods. Each child receiving Method A was matched by 
IQ score with a child receiving Method B. At the end of the year, 
each sample of children was given a creativity test. The scores 
are presented below. Calculate the f-ratio by the direct difference 
method using Formula 37. 


Pairs of Children 

Method A 

Method B 

1st 

42 

36 

2nd 

39 

38 

3rd 

37 

32 

4th 

37 

31 

5th 

34 

25 

6th 

32 

28 

7th 

31 

21 

8th 

27 

20 


5. The researcher in Exercise 4 had the hypothesis that Method A would 
be more effective in teaching creativity than Method B. Using Table 
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2, would you reject the null hypothesis that there is no difference 
between the two methods of teaching art? Use P = .0500 as your 
designated level for significance. 

6. Identify the symbol D. 


094 



set 22 

RANK ORDER CORREIATIDN 


In each statistical technique presented so far, the researcher has had 
to assume that the raw scores were normally distributed in the population 
from which his sample or samples were selected. If he is unable to make 
this assumption he must use statistical techniques which do not require 
such an assumption. Thus, the Pearson product-moment correlation 
coefficient presented in Set 19 is an appropriate statistical technique for 
correlation analysis only when the scores in the population from which 
the samples have been drawn are normally distributed. If this assumption 
cannot be made, the use of the Pearson product-moment correlation 
technique is inappropriate, and the evaluation of its significance may be 
erroneous. 

A correlation technique not concerned with the actual value of the raw 
scores or their distribution in the population is the rank order correlation. 
or rho (p). This is an appropriate statistical technique for determining 
the degree of correlation between two variables when you cannot make 
the assumption that they come from normally distributed populations. 
As will be discussed in this set, the rank order correlation does not take 
into account the value of the raw scores involved; it merely is concerned 



with the placement of each score in relation to the others in the distri¬ 
bution. This set will present methods for ranking scores and computing 
rank order correlation coefficients. 

The method for evaluating the significance of the rank order correlation 
coefficient using Table 5 will be presented, as well as the basis upon which 
the null hypothesis is accepted or rejected. 

SPECIFIC OBJECTIVES FOR SET 22 

At the conclusion of this set you will be able to: 

(1) rank order a set of scores. 

(2) determine the ranks to be assigned to tied scores. 

(3) use Formula 38 to calculate the rank order correlation coeffi¬ 
cient (rho). 

(4) use Table 5 to evaluate the significance of rho. 

(5) identify the symbol p. 



1. If you have a set of scores on a variable that are arranged in order 
of magnitude, they are said to be ranked. Any set of scores can be 


ranked 


2, If the largest score in a set of ten scores is assigned a rank of 1, 
the second largest score is assigned a rank of 2, and so on, then the 
smallest score is assigned a rank of_. 


10 


3 • The term rank ordered is used when you place the scores in order 
of magnitude and assign_to them. 


ranks 


Plate 90 


Raw Scores Ranks 

47 1 

39 2 

38 3 

35 4 

31 5 

29 6 

27 7 

iV= 7 


4* This plate presents a set of scores that have been rank ordered. 
The first column presents the raw scores and the second presents 
their_. 


ranks 
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B. PLATE 90. Score 27 receives the rank of 7 because it is the smallest 

score in the set. The largest score in the set is_, and it receives 

a rank of 


47, 1 


PLATE 90. Note that the raw scores are presented in descending 
order, whereas their ranks are in_order. 


ascending 


7. PLATE 90. It is customary for ranks to be assigned in this manner, 
with the largest score receiving the rank of_, 


1 


Plate 91 


Raw Scores 

Ranks 

40 

1 

39 

2.5 

39 

2.5 

37 

4 

20 

5 

19 

6 

18 

8 

18 

8 

18 

8 

12 

10 

10 

iV= 11 

11 


8. In this plate, there are_raw scores that have been rank ordered. 


11 
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9* PLATE 91. Notice that there are two scores of 39 in this plate. 
These two scores are occupying rank positions of 2 and_. 


3 


10. PLATE 91. Because the two scores of 39 are "tied” for the second 
and third ranks, they are both assigned the average of these two 
ranks, which is_. 


2.5 


11. Whenever there are "ties" among the raw scores, they are assigned 
the average of the rank positions that they occupy. In Plate 91, the 
three raw scores of 18 occupy rank positions of_,_, and_. 


7, 8, 9 


12 . PLATE 91. The three raw scores of 18 occupy rank positions of 7, 
8, and 9. Therefore, they are all three assigned the average of these 
ranks, which is_. 


8 


13 . Below is a set of raw scores. Place them in rank order and assign 
ranks to them. 

12, 9, 4, 10, 15, 7, 6, 9 


Plate 92 


Raw Scores Ranks 

15 1 

12 2 

10 3 

9 4,5 

9 4.5 

7 6 

6 7 

4 8 

iV = 8 
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14. PLATE 92. A rank of 4.5 is assigned to both raw scores of 9 be¬ 
cause they occupy the rank positions of_and_in the rank order 

of the raw scores. 


4, 5 


16. If you have two sets of scores for a sample that have been rank 

ordered, you may compute a correlation between the_ 

orders. 


rank 


Plate 93 



Raw Score 

Raw Score 

Ranks 

Ranks 



Individual 

X 

Y 

X 

Y 

D 


A 

15 

9 

1 

3 

-2 

4 

B 

14 

11 

2 

2 

0 

0 

C 

12 

15 

3 

1 

2 

4 

D 

11 

8 

4 

4 

0 

0 

E 

7 

4 

5.5 

5 

0.5 

0.25 

F 

7 

3 

5.5 

6 

-0.5 

0.25 

G 

Ar= 7 

4 

1 

7 

7 

0 

0 

= 8.5 


16. Plate 93 presents two sets of scores for a sample, each of which 
has been rank ordered. The first and second columns present the 
raw scores for the two variables, and the ranks of the raw scores 
are presented in the_and columns. 



third, fourth 


17. PLATE 93. For individual C, the rank of his X score is 3 because 
it is the third largest score for the X variable. The rank of his Y 
score is because it is the_score for the Y variable. 


1, largest 
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18 • PLATE 93. Individuals E and F both have raw scores of 7 on the X 
variable. The ranks of both individuals E and F are 5.5 on this 

variable because their scores occupy the_and_rank positions 

in the rank order. 


5, 6 


19, FORMULA 38. This formula is for the calculation of the rank order 
correlation coefficient (rho), the symbol for which is_(symbol). 


P 


20* FORMULA 38. The symbol p is the Greek letter, rho, which is pro- 
nounced "row." This is the formula used to compute the correlation 
coefficient of two sets of_(scores/ranks). 


ranks 


21. FORMULA 38. The symbol D in the formula for p is used to indicate 
the between the ranks of an individual. 


difference 


22 • FORMULA 38. The symbols indicate that you square the 

difference between the ranks of the individuals, sum all the squares, 
and then multiply this sum by_. 


6 


23. FORMULA 38. In this formula, the numbers 1 and 6 are "constants" 
that are always used when calculating _(symbol). 


P 
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24. PLATE 93* In addition to presenting the raw scores and their 
ranks, this plate also presents the difference between the ranks 
(D) and the square of the rank differences (D^). For individual A, 
D = and = 


-2, 4 


25. PLATE 93, In this plate, =_and N = 


8.5, 7 


26. PLATE 93. = 8.5, AT =7. 

To obtain the rank order correlation coefficient, substitute the above 
values for the symbols in Formula 38. 


N(N^ - 1 ) 


6 (8,5) 
7(72 - 1 ) 


27. PLATE 93. = 8.5, AT = 7. 

6(8.5) 


p = 1 


== 1 - 


= 1 


7(72-1) 


51 

336 


= 1 - .15 = .85 


28. PLATE 93. The rank order correlation coefficient for this set of 
ranks is p = ,85. Table 5 presents the values of p required for sig¬ 
nificance at the __and_levels. 


.05, .01 


302 









29. Notice that the first column in Table 4 for r is headed df, whereas 
the first column in Table 5 for p is headed_(symbol). 


N 


30. Because the p correlation coefficient is computed on ranks rather 
than raw scores, you do not need to determine df. Table 5 is entered 

directly by using N, which is the number of_ranks 

in the correlation. 


pairs 


31. PLATE 93. p = .85. 

For the data in this plate, N = 


7 


32. PLATE 93. p = .85, JV = 7. 

Use Table 5 to determine the value of p required for significance. 
P=.0500, p=_. P=.0100, p=_. 


.786, .929 


33. PLATE 93. p = .85, iV= 7. 

FROM TABLE 5: for P = .0500, p = .786; for P = .0100, p = .929. 
The obtained p of .85 is_(larger/smaller) than re¬ 

quired at the .05 level of significance. 


larger 


34. The null hypothesis states that there is_correlation between 

the ranks of the X and Y variables. 


no 
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36. PLATE 93. Because the obtained p of .85 is larger than is required 

for significance at the .05 level, you_(accept/reject) 

the null hypothesis. 

reject 

36. Table 5 is for two-tailed significance tests. Therefore, the values 

in this table apply to both positive and_rank order 

correlations. 

negative 
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EXERCISES 


1. A group of soldiers were given instruction in marksmanship. A study 
was conducted to determine if the number of hours spent in practice 
shooting was related to proficiency. Below are presented the data 
for tv/elve soldiers. Use Formula 38 to compute the rank order 
correlation for these data. 


Soldier 

Hours of 
Practice 

Proficiency 

Rating 

A 

24 

75 

B 

23 

83 

C 

19.5 

98 

D 

18 

80 

E 

17 

74 

F 

16.5 

69 

G 

16 

71 

H 

15 

68 

I 

14.5 

59 

J 

14 

62 

K 

13.5 

70 

L 

10 

54 


2. Using Table 5, determine if the rho computed in Exercise 1 is sig¬ 
nificant, Use P = .0100 as your designated level for significance. 
Would you accept or reject the null hypothesis? 

3. A teacher was interested in knowing the extent to which his evaluation 
of his children's cooperativeness was related to their evaluation of 
themselves. He rated each child on cooperativeness, using a scale 
ranging from 1, for very cooperative, to 10, for uncooperative. Each 
child also rated himself, using the same scale. Below are the data 
obtained for ten children. Compute the rank order correlation for 
these data, using Formula 38. 



Teacher 

Child* s 


Teacher* s 

Child* s 

Child 

Rating 

Self-rating 

Child 

Rating 

Self-rating 

A 

5 

4 

F 

8 

2 

B 

10 

2 

G 

4 

3 

C 

1 

3 

H 

4 

5 

D 

6 

1 

I 

6 

5 

E 

2 

3 

J 

6 

6 
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4. Using Table 5, determine if the rho computed in Exercise 3 is sig¬ 
nificant. Use P = .0500 as your designated level for significance. 
Would you accept or reject the null hypothesis? 

5. Identify the symbol p (rho). 
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set 23 

CHI SQUARE: single sample 


Previous sets of this book have presented methods for analyzing data 
in the form of raw scores or ranks. Another type of data with which the 
researcher is concerned is frequency data. This data is not in the form 
of raw scores or ranks, but is the/r^^wewcy with which events occur—for 
instance, the number of boys preferring one soft drink over another or 
the number of college students attending a concert. Any number which 
denotes how many times an occurrence takes place is called frequency 
data. 

Frequency data can be divided into various categories, and the dif¬ 
ference between the frequencies within the categories can be analyzed. 
One statistical technique appropriate for examining frequency data is 
called chi square (X^). 

This set presents the method for calculating chi square of the frequency 
data obtained from one sample and the method of evaluating the signifi¬ 
cance of the chi square, using Table 6. 


307 



SPECIFIC OBJECTIVES OF SET 23 


At the conclusion of this set you will be able to: 

(1) use Formula 39 to compute the chi square from frequency data 
divided into two cells. 

(2) use Formula 40 to compute chi square from frequency data 
divided into more than two cells. 

(3) use Table 6 to determine the level of significance of chi square, 

(4) identify the symbols O, E, and X^. 
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1, Thus far you have applied statistical techniques to raw scores or 
ranks of scores. You may also apply statistical tests to the frequen¬ 
cies of individuals when they are categorized in various ways. 

Plate 94 


Question: "Are you in favor of capital punishment?" 

/ 

Yes 20 

No 60 

N = 80 


The frequency of people answering "yes" to the question in Plate 
94 is . 


20 


2 . 


PLATE 94. Out of a sample of eighty people who were asked this 
question,_answered "yes" and_answered 


twenty, sixty 


3. PLATE 94. In this plate, the numbers 20 and 60 are 
(frequencies/scores). 


frequencies 


4, PLATE 94, In this example, "yes" and "no" are called categories. 
Suppose you wish to answer this question: "Are the frequencies of 
individuals in the two__significantly different?" 


categories 
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5, PLATE 94. The null hypothesis to be tested is: There is_ 

difference between the frequency of people in the two categories. 


no 


6, PLATE 94. If there is no difference between the frequency of in¬ 
dividuals giving "no" answers, you would expect that 

_% of the individuals would fall into each category. 


50 


7. PLATE 94. If you expect 50% of the individuals to answer "yes/’ 
then the expected frequency of "yes" answers for N = 80 is_. 


40 


Plate 95 



O 

E 

LO 

1 

1 

O 

(10 - El - .5)^ 

(10 - £| - .5)^ 
E 

Yes 

20 

40 

19.5 

380.25 

9.5 

No 

60 

40 

19.5 

380.25 

9.5 



8, The frequencies obtained in the two categories are commonly called 
the observed frequencies. In this plate, the observed frequencies 
are_for the "yes" category, and_for the "no" category. 


20, 60 


9. PLATE 95. This plate presents the same data as Plate 94. The 
column marked O presents the observed frequencies, and the column 
marked E presents the_frequencies. 


expected 
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10, The statistical test applied to determine if the observed frequencies 
are significantly different from the expected frequencies is pre¬ 
sented in Formula 39. This is called a_ test. 


chi square 


11, FORMULA 39. The Greek symbol that represents the chi square is 
_(symbol). 


X 


2 


12. FORMULA 39. In the formula, the symbols |0-Fj - .5 indicate that 
you obtain the absolute difference between the O and F for each 
category, and then subtract_from this difference. 


.5 


18. FORMULA 39. For each category, you subtract .5 from the 
difference between O and E, 


absolute 


14. FORMULA 39. By subtracting .5 from the absolute difference be¬ 
tween O and F, you are reducing this difference. This is called the 
Yates correction for 


continuity 


16. FORMULA 39. The symbols ^ indicate that you obtain 

j& 

the absolute difference between O and F, reduce it by ,5, square it, 
and divide by_(symbol). 


E 
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16, FORAIULA 39. The sjmibol X in this formula indicates that in order 
to obtain you_the quotients for all the categories. 


sum 


17. 


Plate 95 presents the application of the test. The third column 
of this plate presents I O- E \ - ,5 for each of the categories. The 

fourth column presents_for each 

of the categories. 


(|0 -E\- .5)2 


18, PLATE 95. For the ”yes” category: O = 20, E - 40. |0 - E! - .5 = 

_; (lO - El - .5)2 = _. 


19.5, 380.25 


19, PLATE 95. The fifth column presents 


(lo - El - .5)2 
E 


This is obtained by dividing (|0 - E\ - .5)^ by E for each of the 


categories. For the "yes'^ category: 


(10 -^E I - .5r 


9.5 


20, PLATE 95. The same procedure is followed for the "no" category. 

V, X, „ X (|0-E1-.5)2 

For the "no" category: - - - ~ _. 


9.5 


21. FORMULA 39. 
you must_ 


This formula indicates that, in order to obtain X^, 
categories. 


sum 
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22, PLATE 95. For these data: 


S 110 ^ 41 ^. 9 .,, .9.5 = 

tj 


19 


23. FORMULA 39. The term cell is used to indicate the number of 
classifications into which the observed frequencies are divided. In 
Plate 95 there are__ cells. 


fvi two 


24. PLATE 95. There are two cells into which the observed frequencies 
are divided. There is the "yes" cell and the "_" cell. 


no 


25. To determine if the value of the indicates that the frequencies in 
the two cells are significantly different, you must first determine 
the_of freedom. 


degrees 


26. Recall that, when dealing with raw scores, the dfis determined by 
how many raw scores are free to vary. When dealing with fre¬ 
quencies, as in the df is determined by how many cells have 
frequencies which are free to_. 


vary 


27. The degrees of freedom for is determined by the number of 
_which have frequencies free to vary. 


cells 
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28. PLATE 95. There are two cells for these data. If, for the ^’yes^^ 
cell, the frequency is known, then the frequency for the ^’no" cell is 
”fixed" because the sum of the frequencies must equal 80. Thus, 
for these data, one cell is not_to_. 


free, vary 


29. The df associated with is the number of cells that are free to 
vary. When your data are divided into two cells, the/ of one cell 
is free to vary; but once it is determined, the/of the other cell is 
fixed (that is, not free to vary). For the data in Plate 95, df = _. 


1 ( 2 - 1 ) 


30, PLATE 95. For these data, df=l because only one cell has fre¬ 
quencies free to_. 


vary 


31. Table 6 presents the values of needed for the .05 and .01 levels 
of significance for the differing df's. For df = 1, the value of X^ 

required for significance at the .05 level is_and at the .01 

level is_. 


3.84, 6.64 


32. PLATE 95. = 19, df = 1. 

mOM TABLE 6; For P = .0500, = 3.84; for P = .0100, X* = 6.64 . 

Because the obtained X* of 19 is larger than either of the two X* 
values listed in Table 6, you conclude that the difference between 

the frequencies in the two categories is significant beyond the_ 

level. 


.01 
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33. PLATE 95. = 19, df = 1. 

Because the obtained of 19 is larger than is required for signifi¬ 
cance at the .01 level, you_(accept/reject) the hypothe¬ 

sis that there is no difference between the frequencies in the ^’yes" 
and "no" categories. 


reject 


34. PLATE 95. On the basis of the rejection of the null hypothesis at 
the .01 level, you conclude that in the population from which this 

sample was selected, most people__ (approve/ 

disapprove) of capital punishment. 


disapprove 


36. In cases where you have more than one degree of freedom, it is not 
necessary to apply the_correction for_. 


Yate s, c ontinuity 


- .. . .. ■ ■ ■■■.-, ^ 

36. The Yates correction for continuity need only be applied in cases 
where you have only_degree of freedom. 


one 


-- ■ .. . -.. . . 

37. FORMULA 40. This formula is for the calculation of X^ when you 
have more than one degree of freedom. Notice that in this formula 

you_(do/do not) subtract .5 from the absolute difference 

between O and E. 


do not 
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38, Chi square tests need not be limited in use to only two cells. Plate 
96 presents the frequencies of individuals’ preferences for four 
brands of coffee. In this sample, N ==_. 


Plate 96 


Question: 

"Which 

kind 

of coffee do you prefer?" 


0 

E 

0- E {0- Ef (0 - Ef 

Brand A 

35 


E 

Brand B 

28 



Brand C 

30 



Brand D 

35 



N 

= 128 




128 


39, PLATE 96. The chi square test for these data will have more than 

one degree of freedom. Therefore, you will use Formula_(39/40) 

in computing the chi square. 


40 


40, PLATE 96. N = 128. In testing the null hypothesis, you would expect 
that the frequencies are equally divided among the four cells. This 
means that your expected frequency for each cell in this plate is_. 


32 


41, PLATE 96. To test the null hypothesis that there is no difference in 
preference among the four brands of coffee, your expected frequency 
for each brand is 32, Use Formula 40. For Brand A compute the 
following: 


O - E = 


(O - Ef = 


(Q - Ef ^ 
E 


3, 9, .28 
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42. PLATE 96. For Brand A: F = 32, O - E = 3, (O - Ef = 9, 


—^=.28. Determine 
E E 

coffee. 


for the other three brands of 


Brand B .50, Brand C .12, Brand D .28 


.12, for Brand 
_the above 

sum 

44. PLATE 96. The chi square for these data is: 

= .28 + .50 + .12 + .28 = _. 

1.18 

46. PLATE 96. = 1.18. 

The frequencies in this plate are categorized into_cells. 

four 

46, PLATE 96. This plate has four cells. If the frequencies contained 
in three of the cells are known, the frequency of the fourth cell is 
fixed. Therefore, for these data, the df =_. 

3 


43. PLATE 96. 

E 

For Brand A = .28, for Brand B = .50, for Brand C = 
D = ,28. 

FORMULA 40. In order to determine you must_ 

values. 
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47* PLATE 96. For these data, df = 2, because only three of the four 
cells have frequencies free to vary. Once these three frequencies 
are known, the frequency of the fourth cell is fixed because the sum 
of all the frequencies must equal_(symbol). 


N 


48 . PLATE 96. = 1.18, df = 3. 

From Table 6, for df —Z the value of X* required to be significant 
at the .05 level is_, and at the .01 level is_. 


7.82, 11.34 


49 . PLATE 96. = 1.18, df = 3. 

FROM TABLE 6: ForP = .0500, X^ = 7.82; for P = .0100, X^ = 11.34. 
Because the obtained X^ of 1.18 is not as large as either of the two 
X^ values listed in Table 6, you conclude that the observed differences 
among the four cells are_(sig¬ 

nificant/ non-significant). 


non-significant 


60. PLATE 96. X^ = 1.18, df = 3. 

Because the obtained X^ is less than is required for significance, 

you_(accept/reject) the null hypothesis that there is 

no difference among the frequencies in the four cells. 


accept 


61 . PLATE 96. By accepting the null hypothesis, you conclude that the 
observed differences in the frequencies among the cells is due to 


sampling error 
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EXERCISES 


1, A group of college students were shown a series of pictures illus¬ 
trating a new clothing style. After viewing the pictures, each student 
was asked if he approved or disapproved of the style. Below are the 
data received from fifty-four students. 

Approve 35 

Disapprove 19 

Compute the chi square for the above data. Determine whether to 
use Formula 39 or 40. 

2. Using P = .0500 as your designated level for significance, use Table 
6 to evaluate the chi square obtained in Exercise 1, State the null 
hypothesis being tested. On the basis of the chi square test, do you 
accept or reject the null hypothesis? 


A researcher wished to examine children*s preferences among four 
types of transportation. A sample of ninety children was randomly 
selected and asked which type they preferred. The following data 

were obtained: 


Automobile 

10 

Bus 

13 

Train 

27 

Airplane 

40 


Compute the chi square for the above data. Determine whether to 
use Formula 39 or 40. 

4. Using P = .0500 as your acceptable level for significance, use Table 
6 to evaluate the chi square obtained in Exercise 3. State the null 
hypothesis being tested. On the basis of the chi square test, do you 
accept or reject the null hypothesis? 

5. A manager of a large manufacturing firm wished to know if the in¬ 
stallation of an employees* lounge had increased or decreased the 
productivity of his employees. He selected a sample of forty em¬ 
ployees and obtained the following data on the change in their 
productivity. 
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Productivity increased 21 

No change in productivity 11 

Productivity decreased 8 

Compute the chi square for the above data. 

6. Using P = .0500 as your acceptable level for significance, evaluate 
the chi square obtained in Exercise 5. Would you accept or reject 
the null hypothesis? 

7. A pharmaceutical firm developed a new medicine to alleviate dis¬ 
comfort of the common cold. In order to test its acceptance, a 
sample of three himdred people with colds was given the medicine. 
They were then asked if they thought the medicine was effective. 
Here are the data obtained. 

Yes 171 

No 129 

Compute the chi square for the above data. 

8. Using P = .0100 as your acceptable level for significance, evaluate 
the chi square obtained in Exercise 7. Would you accept or reject 
the null hypothesis? 

9. Identify the symbol . 
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set 24 

CHI SQUARE: multiple samples 


The previous set presented the chi square test for frequencies obtained 
from one sample. Frequently, the researcher obtains data from more 
than one sample, and these data are divided into a number of categories 
for each sample. The chi square technique can also be applied when 
there are multiple samples providing data. This set presents the use of 
the chi square technique to test the differences among the frequencies 
for various samples categorized separately. Methods are given for 
calculating expected frequencies and determining the significance level 
of chi square, and a formula is presented which determines the degrees 
of freedom associated with the chi square. 

SPECIFIC OBJECTIVES OF SET 24 

At the conclusion of the set you will be able to: 

(1) use Formula 41 to determine expected frequencies for the cells, 
using the observed frequencies obtained from several samples. 
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(2) use Formula 40 to calculate the chi square for multiple samples. 

(3) use Formula 42 to determine the degrees of freedom associated 
with chi square for multiple samples, and Table 6 to determine 
its level of significance. 
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Thus far you have been dealing with one group of individuals which 
has been divided into cells according to categories of responses. 

Plate 97 has three categories of responses for_groups of 

individuals. 

Plate 97 

Observed Frequencies Question: "Do you enjoy classical music?" 


Yes Undecided No ^ row 



A 

B 

c 

Boys 

46 

10 

30 

Girls 

^ 20 

® 18 

^ 50 


Kol 66 28 80 = 174 


two 


2. PLATE 97. A group of boys and a group of girls were asked this 
question. The total number of children in the two groups was 

^total “_* 


174 


3. PLATE 97. The number of boys is 86, indicated at the right of the 
plate under the heading , which is the sum of the three cells 

in the top row. (Cells A, B, and C.) The number of girls is_, 

which is the sum of the three cells in the_ , row. (Cells 

D, E, and F.) 


88, bottom 


4. PLATE 97. Of the 174 children, 66 gave ”yes” responses (total of 
tile first colxmin); 28 gave ^undecided” responses (total of the second 
column); and_gave "no” responses (total of the third column). 
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6 . PLATE 97. Of the 86 boys, 46 answered (Cell A); 10 answered 

"undecided” (Cell B); and_answered "no" (Cell C). 


30 


6 . PLATE 97. The null hypothesis for these data is: "There is no 
difference between boys and girls in the type of responses given to 
this question." Because there are three categories of responses 

and two groups of individuals, the frequencies are divided into_ 

cells. 


six 


7. PLATE 97. A chi square test can be performed to determine whether 
the frequencies in the boys^ cells differ significantly from the 
_in the_cells. 


frequencies, girls ^ 


8. Plate 97 presents the observed frequencies for each of the six cells. 

In order to perform a test, you must determine the_ 

frequency for each cell. 


expected 


0. PLATE 97. To determine the expected frequency (E) for Cell A, 
first determine the N of the row in which Cell A is located. 

^row ~_• 


86 


10, PLATE 97. For Cell A, N^ow =86. 

Next determine the N of the column in which Cell A is located. 



66 
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11, FORMULA 41. This formula can be used to determine, the expected 
frequency for each cell. 

PLATE 97. To determine the E for CeU A, multiply N (containing 

Cell A) by Nf.ol (containing Cell A) and divide by_ 

(symbol). 


^ total 


12. PLATE 97. For Cell A, Nrow = 86, N^ol = 66. 
Use Formula 41 to determine E for Cell A. E = 


32.6 


(86) (66) 

174 


13. PLATE 97. For Cell E, =_and = 


88, 28 


14. PLATE 97. For Cell E, = 88, = 28. 

Use Formula 41 to determine E for Cell E. £ = 


14.2 


(88) (28) 
174 


16. For Cell A, £ = 32.6; for Cell E, £ = 14.2. 

PLATE 97. Use Formula 41 to determine the expected frequency 
(£) for each of the other cells in this plate. 

Plate 98 

Expected Frequencies 


^ 32.6 

® 13.8 

c 

39.5 

° 33.4 

® 14.2 

F 

^ 40.4 


325 













10. PLATE 98. This plate presents the ejqiected frequency for each cell, 
corresponding to the observed frequencies in Plate 97. Thus, for 
Cell D, O =_and E =_. 


20, 33.4 


17. PLATES 97 and 98. To compute the value of for these frequencies, 
use Formula 40 as you did before. For Cell A, O = 46 and E = 32.6. 


O - E = 


(O - Ef = 


(O - Ef _ 
E 


13.4, 179.6, 5.51 


18. PLATES 97 and 98. For Cell A, = 5 . 5 I. Determine this 

lit 

value for each of the other cells. 


CellB = 1.04;CellC=2.28;CellD = 5.37;CellE=1.01;CellF = 2.28 


19. PLATES 97 and 98. Use Formula 40 to determine y?. 
Sum for all cells. = 


17.49 

(5.51 + 1.04 + 2.28 + 5.37 + 1.01 + 2,28) 


20. When you have more than one group of individuals giving more than 
one category of response, the df to be used can be determined by 
Formula 42. 

df = (number of rows - 1 ) (number of columns - 1 ) 

For Plate 97, df = _ 


2 (2 - 1) (3 - 1) 
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21. PLATE 97. <f/= 2. 

Thus two cells have frequencies that are free to vary. This means 
that, for Nigf^i = 174, if the / of two of the cells are known, the/ 
of the other cells is fixed. 


4 


22. PLATES 97 and 98. = 17.49, df=2. 

From Table 6, for df = 2, the value of X^ required for significance 
at the .05 level is_, and at the .01 level is __. 


5.99, 9.21 


28. PLATES 97 and 98. X^ = 17.49, df=2. 

FROM TABLE 6: For P = .0500, X^ = 5.99; for P = .0100, X^ = 9.21. 
Because the obtained X^ of 17.49 is larger than either of the two X^ 
values listed in Table 6, you conclude that the differences among 
the frequencies in the six cells is significant beyond the_level. 


.01 


24. PLATES 97 and 98. Because the obtained is larger than is re¬ 
quired for significance at the .01 level, you_(accept/ 

reject) the hypothesis that there is no difference between boys and 
girls in the type of responses given to this question. 


reject 


26. PLATES 97 and 98. On the basis of the rejection of the null hy¬ 
pothesis at the .01 level, you conclude that in the population from 
which these samples were selected, classical music is preferred 
more by_(boys/girls) than by_(boys/girls). 


boys, girls 
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26. The chi square test should not be used when the expected frequency 
(E) in any cell is less than 5. Thus, if, in Plate 98, E for one of the 

cells had been 4, the use of the test would have been_ 

(appropriate/inappropriate). 


inappropriate 


27. In order to apply the chi square test properly, the E of each cell 
must be at least 


5 


28. When you have only one degree of freedom, you must use Formula 
_(39/40) in computing X^. 


39 


29. When you have more than one degree of freedom, it is not necessary 
to apply the Yates correction for continuity. Therefore, you may 
use Formula_(39/40). 


40 


30. The chi square test may be applied to any number of groups that 
have been divided into any number of categories. If you have five 
groups, each of which is divided into seven categories, your chi 
square test will consist of_cells. 


thirty-five 


31. If you have five groups, each of which has seven categories, the 

degrees of freedom that you have in your chi square test is_. 

(Use Formula 42.) 


24 
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EXERCISES 


1. A physical education instructor wished to determine preferences of 
boys and girls for three activities. He asked a sample of forty boys 
and thirty girls to state their preferences. Here are the data he 
received. 

Basketball Volleyball Kickball 
Boys 20 5 15 

Girls 7 13 10 

Compute the chi square for the above data. 

2. Using P - .0500 as your acceptable level for significance, evaluate 
the chi square obtained in Exercise 1. Would you accept or reject 
the null hypothesis? If you reject it, what can be said regarding the 
preferences of boys and girls? 

3. A television network wished to determine whether certain types of 
programs appealed to different nationality groups. An interviewer 
asked people in four nationality groups to indicate their preference 
among five types of television programs. Here are the data he 
obtained. 



French 

German 

Italian 

Chinese 

Program type A 

20 

18 

16 

14 

Program type B 

16 

14 

18 

16 

Program type C 

14 

14 

12 

18 

Program type D 

20 

16 

14 

16 

Program type E 

18 

20 

14 

18 


Compute the chi square for the above data. 

4. Using P = .0500 as your acceptable level for significance, evaluate 
the chi square obtained in Exercise 3. Would you accept or reject 
the null hypothesis? If you reject it, what can be said about the 
preferences of different nationality groups? 
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FORMULAS 



Formula 1 


Calculation of the Median 


Mdn - M 



S = the sum of 

= frequency below the interval which contains the Mdn 
= frequency within the interval which contains the Mdn 


Formula 2 Calculation of the Mean 




N 


Formula 3 Calculation of the Percentile 


Percentile in 
decimal form 


fu, * Vi 

N 


X = score value for which the percentile is to be computed 
= frequency below the i which contains the score 
= frequency within the i which contains the score 

Multiply the decimal form of the percentile by 100 in order to deter¬ 
mine the percentile. 


Formula 4 Calculation of the Range 

Range = H - L + 1 

H = highest score in frequency distribution 
L = lowest score in frequency distribution 
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Formula 5 Calculation of the Semi-^interquartile Range 



- 

2 


Formula 6 Calculation of a Deviation Score 

X = X - jn 


Formula 7 Calculation of the Average Deviation 


A.D. 


S/UI 

N 


\x\ = the absolute deviation of a raw score from the mean 


Formula 8 Calculation of the Standard Deviation by the Deviation Score l^ethod 


a 



Formula 9 Calculation of the Standard Deviation by the Raw Score lAethod 


cr 



- M 


2 


Formula 10 Calculation of a Standard Normal Deviate 


Z 


a 
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Formulas 11 and 12 Calculation of the Sum of Squared Deviations from the |jl by 
the Deviation Score Method and the Raw Score Method 

Deviation Score Method 

’^sum of squares” = S (X - [if (Formula 11) 
Raw Score Method 

■Lx^ = (Formula 12) 


Formulas 13 through 16 Estimate of the Population Variance and Standard Devi¬ 
ation from Sample Data by the Deviation Score Method 
and the Raw Score Method 


Deviation Score Method 


Variance s 


Standard deviation s 


,2 _ 




N 


= rizi 
-1 


Variance s 
Standard deviation s 


Raw Score Method 
2 ^ 


N{N - 1) 


■V 


N{N - 1) 


(Formula 13) 
(Formula 15) 

(Formula 14) 
(Formula 16) 


Formula 17 Estimate of the Standard Error of the Mean 


Formula 18 Estimate of the Standard Error of the Difference (Pooled Variance) 


^diff - 


Vi + Vg - 2 
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Formula 19 Calculation of fho t-Ratio for Independent (Non-correlated) Samples 


^ Xj - X2 
^diff 

df = N2 - 2 


Formula 20 F Test, Formula for Computing the F-Ratio in the Analysis of 
Variance 

p - Mean Square between groups _ 

Mean Square within groups MSj^ 


Formula 21 Composition of the Total Sum of Squares 

Total sum of squares = sum of squares between groups + sum of 
squares within groups. 

SS, = SSfe + ss^ 


Formulas 22 through 24 Calculation of Sum of Squares 


Total Sum of Squares SS^ = 

Sum of Squares within Groups = SS^ + SS^ + SS^ 

(within group) 

Sum of Squares between Groups SS^ = SS^ - SS^ 

Sh = - X^f + N^{X^ - X,f X,f 


(Formula 22) 
(Formula 23) 

(Formula 24a) 
(Formula 24b) 
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Formulas 25 through 27 


Calculation of Degrees of Freedom for Analysis of 
Variance 


Degrees of freedom 


Total 

11 

1 

(Formula 25) 

Between groups 

df^ = No. of groups - 1 

(Formula 26) 

Within groups 

1 ! 

(Formula 27) 

Formulas 28 and 29 

Calculation of Meon Squares 



SSb 




(Formula 28) 





w 

(Formula 29) 


Formula 30 Computational Formulas for Sums of Squares 



Group A 

Group B 

Group C 

Total 


SXa 

'BXb 


2 Xt 







Na 

Nb 

Nc 

Nt 

Step 1 

Correction term 

(2X,)2 


Step 2 

Total sum of 

squares 

SSt = 2X^2 - C 


Step 3 

Sum of squares between 
groups 

CsxJ 

(SXj)2 (SX<,)2 

Nb ' Nc ~ 

Step 4 

Sum of squares within 

SS^ =SS,-SSf^ 



groups 
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Formula 31 F Test. Calculation of4be F^Ratio for Comparison of Two Variance 
Estimates 

p = larger 

smaller s2 


Formula 32 Calculation of the Pearson Product-Moment Correlation Coefficient 

NZXY - (SX) (2F) 

y/[NJ:X^ - (SX)2] [iVSF^ _ (2r)2] 

df = N - 2 

N = number of pairs of scores 

Formula 33 Formulas for Calculaticn of the Regression of Y on X 

Formula 33a Y = a + (regression equation) 

SXF « 

Formula 33b by^ -- (regression coefficient) 

^v2 i^Xf 
^ ~~N~ 

Formula 33c a = Y - hy^X 

Y = predicted value of Y 

Formula 34 Formulas for Calculation of the Regression of X on Y 

f>j 

Formula 34a X = q + b^yY (regression equation) 

Formula 34b b^^y = -- (regression coefficient) 

~w- 

Formula 34c a - X - b^yY 

X = predicte(l value of X 


337 






Formula 35 Estimate of the Standard Error of the Difference for Correlated Samples 


Stt = standard error of the mean of sample 1 
Sx^ = standard error of the mean of sample 2 
r = correlation coefficient between sample 1 and sample 2 

Formula 36 Calculation of the t—Ratio for Dependent (Correlated) Samples 



Sdiff 


df - number of pairs of scores - 1 


Formula 37 Estimate of the Standard Error of the Difference for Correlated Samples 
(Direct Difference Method) 

Formula 37a = 22)^ - — 

N 


Formula 37b s = 

D= Xi - X2 for any pair of scores 
N- number of pairs of scores 



Formula 38 Calculation of the Rank Order Correlation Coefficient (rho) 


P = 1 




D = difference between a pair of ranks 
N- number of pairs of ranks 
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Formula 39 Calculation of the Chi Square when df = 1 

^2_y 0O-E\ - .5f 
^ E 

O = observed frequency 

E = expected frequency 

.5 = Yates correction for continuity 


Formula 40 Calculation of the Chi Square when df Is Larger than 7 

^ E 


Formula 41 Calculation of the Expected Frequency fE) of a Cell 

i^row)i^col) 


E = 


N 


total 


Formula 42 Calculation of the Degrees of Freedom for Chi Square when There Are 
More Than One Group of Individuals Giving More Than One Category 
of Responses 

df = (number of rows ~ 1) (number of columns - 1) 
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TULES 



TABLE 1. FUNCTIONS OF A NORMAL CURVE 
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Table 1 is abridged and reprinted from Tables III and VI, Allen L. Edwards, Statistical Methods 
the Behavioral Sciences, Holt, Rinehart & Winston, Inc., New York, 1954. By permission of 
author and publisher. 










TABLE 2. VALUES OF t AT THE .10, .05, .02, AND .01 
LEVELS OF SIGNIFICANCE 


df 

P = .10 

P = .05 

P = .02 

P = .01 

1 

6.314 

12.706 

31.821 

63.657 

2 

2.920 

4.303 

6.965 

9.925 

3 

2.353 

3.182 

4.541 

5.841 

4 

2.132 

2.776 

3.747 

4.604 

5 

2.015 

2.571 

3.365 

4.032 

6 

1.943 

2.447 

3.143 

3,707 

7 

1.895 

2.365 

2.998 

3.499 

8 

1.860 

2.306 

2.896 

3.355 

9 

1.833 

2.262 

2.821 

3.250 

10 

1.812 

2.228 

2.764 

3.169 

11 

1.796 

2.201 

2.718 

3.106 

12 

1.782 

2.179 

2.681 

3.055 

13 

1.771 

2.160 

2.650 

3.012 

14 

1.761 

2.145 

2.624 

2.977 

15 

1.753 

2.131 

2.602 

2.947 

16 

1.746 

2.120 

2.583 

2.921 

17 

1.740 

2.110 

2.567 

2.898 

18 

1.734 

2.101 

2.552 

2.878 

19 

1.729 

2.093 

2.539 

2.861 

20 

1.725 

2.086 

2.528 

2.845 

21 

1.721 

2.080 

2.518 

2.831 

22 

1.717 

2.074 

2.508 

2.819 

23 

1.714 

2.069 

2.500 

2.807 

24 

1.711 

2.064 

2.492 

2.797 

25 

1.708 

2.060 

2.485 

2.787 

26 

1,706 

2.056 

2.479 

2.779 

27 

1.703 

2.052 

2.473 

2.771 

28 

1,701 

2.048 

2.467 

2.763 

29 

1.699 

2.045 

2.462 

2.756 

30 

1.697 

2.042 

2.457 

2.750 

60 

1.671 

2.000 

2.390 

2.660 

oo 

1.645 

1.960 

2.326 

2.576 


Table 2 is taken from Table 3 of Fisher & Yates: Statistical Tables for Biological. 
Agricultural and Medical Research, published by Oliver & Boyd Limited, Edinburgh 
and by permission of the authors and publishers. 
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TABLE 3. F-RATIOS FOR .05 (ROMAN) AND .01 (BOLDFACE) LEVELS OF SIGNIFICANCE 
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4.41 3.55 3.16 2.93 2.77 2.66 2.51 2.34 2.15 1.92 
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ble 3 is abridged from Table P of Ho E« Garrett, Statistics in Psychology and Education, 5th edi 
vid McKay Co., Inc., New York, 1958, by permission of the author and publishers. 




TABLE 4. VALUES OF r AT THE .10, .05, .02, AND .01 
LEVELS OF SIGNIFICANCE 


df 

P = .1000 

P = .0500 P = .0200 

P = .0100 

1 

.988 

.997 

.9995 

.9999 

2 

.900 

.950 

.980 

.990 

3 

.805 

.878 

.934 

.959 

4 

.729 

.811 

.882 

.917 

5 

.669 

.754 

.833 

.874 

6 

.622 

.707 

,789 

.834 

7 

.582 

.666 

.750 

.798 

8 

.549 

.632 

.716 

.765 

9 

.521 

.602 

.685 

.735 

10 

.497 

.576 

.658 

.708 

ii 

.476 

.553"^ 

.634 

.684 

12 

.458 

.532 

.612 

.661 

13 

.441 

.514 

.592 

.641 

14 

.426 

.497 

.574 

.623 

15 

.412 

.482 

.558 

.606 

16 

.400 

.468 

.542 

.590 

17 

.389 

.456 

.528 

.575 

18 

.378 

.444 

.516 

.561 

19 

.369 

.433 

.503 

,549 

20 

.360 

.423 

.492 

.537 

21 

.352 

.413 

.482 

.526 

22 

.344 

.404 

.472 

.515 

23 

.337 

.396 

.462 

.505 

24 

.330 

.388 

.453 

.496 

25 

.323 

.381 

.445 

.487 

26 

.317 

.374 

.437 

.479 

27 

.311 

.367 

.430 

.471 

28 

.306 

.361 

.423 

.463 

29 

.301 

.355 

.416 

.456 

30 

.296 

.349 

.409 

.449 

35 

.275 

.325 

.381 

.418 

40 

.257 

.304 

.358 

.393 

45 

.243 

.288 

.338 

.372 

50 

.231 

.273 

.322 

.354 

60 

.211 

.250 

.295 

.325 

70 

.195 

.232 

.274 

.302 

80 

.183 

.217 

.256 

.283 

90 

.173 

.205 

.242 

.267 

100 

.164 

.195 

.230 

.254 

Table 

4 is taken from Table V.A. 

of Fisher; 

Siatistical Methods for 

Research Workers, 


published by Oliver & Boyd Limited, Edinburgh, and by permission of the author and 
publishers. 
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TABLE 4 (SUPPLEMENT). ADDITIONAL VALUES OF r AT THE 5 AND 1% 

LEVELS OF SIGNIFICANCE 


df 

P = .0500 

P = .0100 

32 

.339 

.436 

34 

.329 

.424 

36 

.320 

.413 

38 

.312 

.403 

42 

.297 

.384 

44 

.291 

.376 

46 

.284 

.368 

48 

.279 

.361 

55 

.261 

.338 

65 

.241 

.313 

75 

.224 

.292 

85 

.211 

.275 

95 

.200 

.260 

125 

.174 

.228 

150 

.159 

.208 

175 

.148 

.193 

200 

.138 

.181 

300 

.113 

.148 

400 

.098 

.128 

500 

.088 

.115 

1000 

.062 

.081 


Additional values of r are reprinted from Table VI, Allen L. Edwards, Statistical 
Methods for the Behavioral Sciences. Holt, Rinehart & Winston, Inc., New York, 1954. 
By permission of the author and publisher. 
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TABLE 5. VALUES OF P (RANK ORDER CORRELATION COEFFICIENT) 
AT THE .05 AND .01 LEVELS OF SIGNIFICANCE 


N 


P = .0500 


5 

6 

7 

8 
9 


1.000 

.886 

.786 

.738 

.683 


10 

12 

14 

16 

18 


.648 

.591 

.544 

.506 

.475 


20 

22 

24 

26 

28 


.450 

.428 

.409 

.392 

.377 


30 


.364 


P = .0100 


1.000 

.929 

.881 

.833 

.794 

.777 

.715 

.665 

.625 

.591 

.562 

.537 

.515 

.496 

.478 


Computed from Olds, E. G., “Distribution of the Sum of Squares of Rank Differences 
for Small Numbers of Individuals,” Annals of Mathematical Statistics, 1938, IX, 
133-148, and the 5% significance levels for sums of squares of rank differences and a 
correction, Annals of Mathematical Statistics, 1949, XX, 117-118, by permission of 
the Institute of Mathematical Statistics. Table 5 is taken from Elementary Statistics, 
Underwood et al, by permission of the publisher, Appleton-Century-Crofts. 
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TABLE 6. VALUES OF CHI SQUARE (X®) AT THE .05 AND .01 
LEVELS OF SIGNIFICANCE 

df P = .0500 P = .0100 

1 3.84 6.64 

2 5.99 9.21 

3 7.82 11.34 

4 9.49 13.28 

5 11.07 15.09 

6 12.59 16.81 

7 14.07 18.48 

8 15.51 20.09 

9 16.92 21.67 

10 18.31 23.21 

11 19.68 24.72 

12 21.03 26.22 

13 22.36 27.69 

14 23.68 29.14 

15 25.00 30.58 

16 26.30 32.00 

17 27.59 33.41 

18 28.87 34.80 

19 30.14 36.19 

20 31.41 37.57 

21 32.67 38.93 

22 33.92 40.29 

23 35.17 41.64 

24 36.42 42.98 

25 37.65 44.31 

26 38.88 45.64 

27 40.11 46.96 

28 41.34 48.28 

29 42.56 49.59 

30 43.77 50.89 

Table 6 is taken from Table 4 of Fisher & Yates, Slatktical Tables for Biological. 
Agricultural and Medical Research, published by Oliver & Boyd Limited, Edinburgh, 
and by permission of the authors and publishers. 
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ANSWERS TO EXERCISES 



Set 1 


1. X f 

15 1 

14 0 

13 0 

12 1 

11 2 

10 1 

9 2 

8 2 

7 1 

6 1 

5 2 

4 0 

3 0 

2 1 

1 1 


AT = 15 

4. 8.5; 106.5; .5; 35.5. 

5. 10.5; .5; 3.5. 


2. i f 

13-15 1 

10-12 4 

7-9 5 

4-6 3 

1-3 2 


3. e / 

41-44 2 

37-40 4 

33-36 6 

29-32 3 

25-28 0 

21-24 2 


AT = 17 


class in- 


6. raw score, / = frequency, N= number of scores, i 
terval, m == real lower limit. 


Set 2 

1. The mode is the most recurring or frequent score or class interval 
in a frequency distribution. 

2. The median is the midpoint or center of the frequency distribution. 
It is the point above which and below which 50% of the raw scores 
lie. The symbol for median is Mdn. 

3. (a) Mode = 9, Mdn = 8; (b) Mode = 109, Mdn = 107,7; (c) Mode = 
83-86, Mdn = 85.17; (d) Mode = 10-19, Mdn = 22.5. 

4. 2) = the sum of, = the sum of frequencies below the interval 
which contains the median, 2/j^ = the frequency within the interval 
which contains the median. 
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Set 3 


1. The mean of a frequency distribution is the arithmetic average of 
the raw scores. Its symbol is jit. 


2. 

3, 5.5, 105.5, 82. 





3. 

(a) ju = 6.36, (b) M = 

57.24, (c) IX = 

14.79, (d) M = 

= 50.88. 


4. 

X / 

(X-50) 

fX 




59 3 

9 

27 




58 4 

8 

32 

123 



57 6 

7 

42 

+ 50 = 57.24 


56 2 

6 

12 



55 2 

5 

10 




X= 17 


123 



5. 

i f 

(X-40) 

fX 




57-60 2 

18.5 

37 




53-56 5 

14.5 

72.5 

228.5 

21 



49-52 9 

10.5 

94.5 IX 

+ 40 = 50.88 


45-48 3 

6.5 

19.5 



41-44 2 

2.5 

5 




X = 21 


228.5 




Set 4 

1. X 

(X+5) 

/ 

fX 


2 

7 

2 

14 


1 

6 

3 

18 


0 

5 

5 

25 

89 

^=-_5=-.77 

-1 

4 

4 

16 

-2 

3 

3 

9 

-3 

2 

3 

6 


-4 

1 

1 

1 




AT = 21 

1 0^ 

1 00 
!! 


2. (a) Mean 

, (b) Median and Mode. 




3. This means that 80% of the total group received language achieve¬ 
ment scores below John Jones, and 20% received scores higher than 
his. 
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4. 25th percentile, 50th percentile, 75th percentile, Qi, Q 2 , Q 3 . 

5. 10th percentile, 40th percentile, 60th percentile, Dq. 


6 . (a) 40th percentile = 11.5 + ^ “ 


11.67; 

Q 3 = 75th percentile = 12.5 + ) 1 = 13.00; 

Dq = 60th percentile = 11.5 + ^ 1 - 12.33. 

(b) 40th percentile ™ 8,5 + — ^ 4 = 10.28; 

Q 3 = 75th percentile = 12.5 + j 4 = 15; 

Dq = 60th percentile = 8.5 + ^ 4 - 12.5. 


7. For score value 11: 


For score value 13: 


For score value 14: 




3+4 


20 

/13 - 12.5> 

) 4 + 13 

\ 1 > 

20 


/U - 13.5 

2 + 17 

V 1 ) 

20 


' 27.5th percentile. 


75th percentile. 


90th percentile. 


8 . For score value 6 : 


For score value 12: 


For score value 17: 




25 
12 - 8.5 


= 14th percentile. 


(^)-e 

25 

(^) 

25 


= 55.5th percentile. 


4 + 21 


= 86 th percentile. 
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2. (a) Range = 8, Q = 1.27; (b) Range =35, Q - 7.00; (c) Range = 7, 
Q = 1.13; (d) Range =28, Q = 5.11. 

3. (a), (c), and (d) 

4. (b) 

5. (c) This frequency polygon is skewed to the right because the slope 
of the curve trails off in that direction. 

6 . (d) 

7. {a) Yes. The range will be decreased by the same amount as the 
extreme score is decreased, (b) No. The semi-interquartile 
range will remain the same. The value of an extreme score does 
not affect it. 

8 . The mean is most affected by the degree of skewness in a distri¬ 
bution. The mode is not affected. 


Set 6 

1 . The symbol represents a deviation score. The symbol \x\ repre¬ 
sents an absolute deviation score. 

2 . The term absolute deviation score means the amount that a raw 
score deviates from the jit, disregarding the direction of the devi¬ 
ation. It is calculated by subtracting the value of the raw score 
from the value of the M and disregarding the sign. 


(a) X 

/ 

fx 

X 

kl 

f\x\ 


10 

1 

10 

3 

3 

3 


9 

2 

18 

2 

2 

4 

M = 7 

8 

5 

40 

1 

1 

5 

7 

4 

28 

0 

0 

0 

24 

A.D.=-=1.26 

6 

3 

18 

-1 

1 

3 

5 

3 

15 

-2 

2 

6 


4 

1 

11 

4 

133 

-3 

3 

3 

Iflxi = 24 
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(b) X 

/ 

fx 

X 

\x\ 


f\x\ 


51 

1 

51 

3.5 

3.5 


3.5 


50 

1 

50 

2.5 

2.5 


2.5 


49 

1 

49 

1.5 

1.5 


1.5 M=47.5 


48 

4 

192 

0.5 

0.5 


19 

1.5 A.D. =77=1.36 

3.0 


47 

3 

141 

-0.5 

0.5 



46 

2 

92 

-1.5 

1.5 



45 

2 

90 

-2.5 

2.5 


5.0 


N = 

= 14 

665 



S/Ul 

= 19,0 

4. 

In a normal distribution, the ju, Mdn, and Mode are identical and 
are located at the modal point of the distribution. 

5. 

The symbol cr 

represents 

the 

standard deviation in a normal 


distribution. 






6 . 

(a) 34.13%, 

(b) 68.26%, (c) 4.54%, (d) 0.11%. 


Set 7 








1. (a) 

X 

/ 

fx 

X 



fx^ 


10 

1 

10 

4 

16 


16 


9 

2 

18 

3 

9 


18 


8 

3 

24 

2 

4 


12 


7 

4 

28 

1 

1 


44 


6 

5 

30 

0 

0 


0 M = 6 


5 

4 

20 - 

-1 

1 


4 1 - 


4 

3 

12 - 

-2 

4 


12 = 2 
18 V 25 


3 

2 

6 - 

-3 

9 



2 

_1 

2 - 

-4 

16 


16 



AT = 25 

150 




= 100 

(b) 

X 

/ 

fX X^ 



fx^ 



20 

1 

20 400 



400 



19 

4 

76 361 



1444 

M = 17,76 


18 

6 

108 324 



1944 



17 

3 

51 289 



867 


16 

2 

32 256 



512 


15 


15 225 



225 




N = 17 

302 

ii 

5392 
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2. The ^standard normal deviate” denotes the amount a raw score 
deviates from the ijl, expressed in standard deviation units. The 
symbol for the ^standard normal deviate” is >2. 

3. For score 9 in 1(a), ^ = 1.5; for score 16 in 1(b) , ^ = -1.32. 

4. Proportion of scores between score values 5 and 8 in 1(a) is .5328; 

proportion of scores between score values 16 and 18 in 1(b) is .4780. 

5. Proportion of scores that lie above score value 9 in 1(a) is .0668; 
proportion of scores that lie below score value 17 in 1(b) is .2843. 

Set 8 

1. The symbol P represents probability. 

2. P=.5000. 

3. P=.1500. 

4. P=.0500. 

5. (a) P = .3413, (b) P = .4834, (c) P = .7187, (d) P = .0279. 

6. The term population is used to denote all of the individuals with a 

given characteristic. The term sample is used to denote any portion 

of a population. 

7. Population characteristics are called parameters. Sample charac¬ 
teristics are called statistics, 

8. Population parameters are represented by Greek letters. Sample 
statistics are represented by Roman letters. 

9. The symbol (Jl represents the mean of a population. The symbol cr 
represents the standard deviation of a population. The symbol X 
represents the mean of a sample. The symbol s represents the 
standard deviation of a sample. 


Set 9 

1 . The ’'standard error of the mean” indicates the amount of variability 
among sample means expressed in standard deviation units. Its 
symbol is 

2. (a) P = .4772, (b) P = .8185, (c) P = .6713. 
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3. (a) P=.3413, (b) P=.4082, (c) P == .3779. 

4. The standard error of the mean is reduced when the size of the 
sample is increased. 


Set 10 


1 . (a) = 66. Identical answers for deviation score method and raw 

score method, (b) Hx’^ = 1170. Identical answers for deviation score 
method and raw score method. 

2. (a) df = 15; (b) df = 19. 

3. The standard deviation is the square root of the variance. 


4. (a)s=Jff =2.10; 


5. (a) s 


/l6(850)-(112r 


= 2.10; (b) s = 


/20(5670)-(300r 


= 7.85. 


^ Y 16(15) ^ Y 20(19) 

6 . = estimate of a population variance, = population variance. 

df - degrees of freedom. 


Set 11 

1. (a) sj=.5; (b) sj=1.75. 

2. P = .4772, P = .6826, P = .7210; 5% confidence interval is from 
14.02 to 15.98. 

3. P = .4314, P = .0436, P = .0294; 5% confidence interval is from 
196.57 to 203.43. 

4. Researchers generally will accept a hypothesis as being correct if 
the probability of its being incorrect is only P = .0500. 

5. Estimate of the standard error of the mean. 


Set 12 

1 . The 5% and the 1% confidence intervals. The 1% confidence interval 
is the more stringent. 
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2. (a) 1% confidence interval, 98.27 to 101.73; (b) 1% confidence inter¬ 
val, 73.32 to 76.68. 

3. Table 2 for the values of t should be used for estimating syfrom 
small samples. Table 1 for the normal curve functions s];iould be 
used for large samples. 

4. (a) 5% confidence interval, 42.67 to 57.33. 1% confidence interval, 
39.99 to 60.01. (b) 5% confidence interval, 23.55 to 41.65. 1% con¬ 
fidence interval, 19.83 to 45.37. 

5. The t distribution describes the distribution of sample X’s for small 
samples. For small samples, the t distribution is "flatter" than 
normal distribution. As the size of the sample becomes larger, the 
t distribution more nearly resembles the normal distribution. 


Set 13 

1. There is no difference in the final examination scores of freshmen 
who are taught elementary statistics by the two different methods. 

2. There is no difference between the final algebra examination scores 
of children who study in the morning and those who study in the 
evening. 

3. The standard error of the difference is the standard deviation of 
differences between sample means. 

4. You should accept the null hypothesis. 

5. Accepting the null hypothesis means that the difference between 
your two sample means is a result of sampling error and does not 
represent a real difference. 

6 . The symbol ^diff represents the standard error of the difference 
between sample means. 

Set 14 



2. t = 2.4/.95 = 2.53. For df = 30, P = .0500, t = 2.042; P = .0100, 
t = 2.750. Because the t-ratio exceeds that required for P = .0500, 
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the null hypothesis should be rejected at that level. Because the t- 
ratio does not exceed that required for P - .0100, the null hypothesis 
should not be rejected at that level. 


^diff 


77 + 98 


11 + 17 


- 2 / \11 17/ 


1.00 


4. t = 3.3/1.00 =3.3. Yovdf = 26, P = .0500, t = 2.056; P = .0100, 
t = 2.779. Because the ^-ratio exceeds that required for both P = 
.0500 and P = .0100, the null hypothesis should be rejected at P = 
, 0100 . 


5. s^ijf is the symbol for the standard error of the difference between 
sample means. 


Set 15 

1. A one-tailed test is appropriate for research in which the direction 
of the difference is hypothesized. A two-tailed test is appropriate 
when the direction of the difference is not hypothesized. 

2. (a) One-tailed test, (b) One-tailed test, (c) Two-tailed test, 
(d) Two-tailed test. 

3. For a two-tailed test, t ~ 2.179; for a one-tailed test, t = 1.782. 

4. For a two-tailed test, t = 2.807; for a one-tailed test, i = 2.500. 

5 . A two-tailed test requires a larger ^-ratio than a one-tailed test. 

6 . A Type I error occurs if you reject the null hypothesis when, in fact, 
no difference exists. 

7. A Type II error occurs if you accept the null hypothesis when, in 
fact, a true difference exists. 

8 . The smaller the level of significance, the less probability there is 
that you have made a Type I error by rejecting the null hypothesis. 
The smaller the level of significance, the more probability there is 
that you have made a I^pe II error by rejecting the null hypothesis. 


Set 16 

1 . The total sum of squares is divided into the sum of squares within 
groups and the sum of squares between groups. 
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2. SSt = 611; SSi = 180; SS^ = 431. 

3. Between groups: = 2, MS^ = 90. Within groups: df^ = 27, MS^ = 

15.96. 

4. The symbol SSf represents the sum of squares between groups; the 
symbol SS^ represents the sum of squares within groups; the symbol 
SSt represents the total sum of squares; the symbol MS represents 
the Mean Square, which is the term used to describe the variance in 
the analysis-of-variance technique. 

Set 17 

1. 90/15.96 = 5.64. 

2. Entering Table 3 with 2 and 25 degrees of freedom, for significance 
at P = .0100 an F-ratio of 5.57 is required. The obtained jP-ratio 
of 5.64 exceeds the required value for significance at P == .0100. 
On the basis of the F test, you should reject the null hypothesis at 
P = .0100. 

3. jP = 215/72 = 2.986. For significance at P = .01, at df'sot 24 and 
40, an P-ratio of 2.29 is required. Therefore, the null hypothesis 
that there is no difference in the variability of the scores in the two 
groups should be rejected. 

4. The symbol Prepresents the ratio between two variance estimates. 


Set 18 


97 - 
96 - 

English scores 95 

(Y) 94 - 

93 - 
92 - 

91 ^ 
79 


-L_I_ \ _^_I_L_J 

80 81 82 83 84 85 86 

Algebra scores 
(X) 
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2 . A perfect positive correlation between two variables means that 
for every increase in score value on one variable there is a corre¬ 
sponding increase in score value on the other variable.; A moderate 
positive correlation between two variables means that every in¬ 
crease in score value on one variable is not perfectly matched with 
a corresponding increase on the other variable, although they gener¬ 
ally increase. 

3. A perfect negative correlation between two variables means that 
for every increase in score value on one variable there is a corre¬ 
sponding decrease in score value on the other variable. A moderate 
negative correlation between two variables means that every in¬ 
crease in score value on one variable is not perfectly matched with 
a corresponding decrease on the other variable, althou^ they gener¬ 
ally decrease. 

4. A positive correlation. It is not perfect. 

5 . A zero correlation means that the scores on one variable are not 
related, either negatively or positively, to the scores on the other 
variable. 


Set 19 

1. Perfect positive correlation: r = 1.00; perfect negative correlation: 
r = -1.00. 


2 ^ ,-8L1_846), -.._(12_01 (116)-^ ^ 

V [8(1886) - (120r][8(1916) - (lien 

3. It requires a two-tailed test of significance. For df = 6: P = .0500, 
r - .707; P = .0100, r = .834. 

4. There is no relationship between arithmetic scores and English 
scores. Because the obtained r does exceed that required at P = 
.0500, the null hypothesis should be rejected at that level. 


5 . r 


11(3757) - (276) (142) 
y/ [11(7152) - (276)^][11(2080) - (142)^] 


It requires a one-tailed test of significance. For df = P = .0500, 
r = .521; P = .0100, r = .685. 
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7. 


There is no relationship between spelling scores and English scores. 
Because the obtained r exceeds that required at P = .0100, the null 
hypothesis should be rejected at that level. 

8 . The symbol r represents the Pearson product-moment correlation 
coefficient. 


Set 20 

1, A regression line is the line drawn in a scatter diagram which ”best 
fits" the correlated data for two variables. It is the line from which 
you can predict the score value on one variable for any selected 
value on the other variable. 


2. Regression of Y on X: hyx 


833 


(150) (52) 
10 


2320 - 


(150)2 

10 


.757 


3. For regression of Y on X: <2 = 5.2 - (.757)15 = -6.155. For score 
13: Y == -6.155 + (.757) (13) = 3.686. For score 19: Y = -6.155 + 
(.757) (19) - 8.228. 


4. 


(Y) 

Social 

studies 

test 

scores 



5. Regression of X on Y: h^y 


833 


(150) (52) 
10 


352 - 


(52)2 

10 


.650 
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6 . 


For ^egression of X on Y: a = 15 - (.650)5.2 = 11.620. For score 
2 : X = 11.620 + (.650) (2) = 12.920. For score 7: X = 11.620 + 
(.650) (7) = 16.170. 

7, See scatter diagram in Exercise 2, 

8 . The symbol Y represents a value of Y predicted by the regression 
equation for Y on X. The symbol byx represents the regression 
coefficient for Y on X. The symbol X represents value of X pre¬ 
dicted by the regression equation for X on Y. The symbol b^y 
represents the regression coefficient for X on Y. 


Set 21 

1. Scores are considered dependent when they are for the same group 
of individuals or when the individuals have been matched on some 
variable. 

2. = V(2.6)* + (3.1)* - 2(.46)(2.6)(3 1) = 2.992 

t = 4/2.992 = 1.337 

3. At the .05 level of significance, the null hypothesis should be ac¬ 
cepted. This is a two-tailed test requiring t =2.056 to be significant 
at P = .0500. 

4. = 56, s^iff = 2.828, t = 2,12. 

5. This hypothesis requires a one-tailed test of significance. At df - 1 
for significance at P = .0500, the ^-ratio must be at least 1.895. 
Therefore, the null hypothesis should be rejected. 

6. The symbol D represents the difference between scores on the same 
variable for two individuals who have been matched on some other 
variable. 


Set 22 


1 . p = 1 


6(40) 

12(144-1) 


.860 


/ 
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2 . 


N = 12. For P = .0100, p = .777. The rho computed in Exercise 1 
exceeds the value required for significance at P == .0100. There¬ 
fore, the null hypothesis that the number of hours of marksmanship 
practice are not related to proficiency ratings should be rejected. 


3. p = 1 


6 ( 202 ) 

10(100-1) 


= -.244 


4. N = 10. For P = .0500, p = .648. The rho computed in Exercise 3 
does not reach the value required for significance atP = .0500. 
Therefore, you should accept the null hypothesis that the teacher 
ratings of the children’s cooperativeness and the children’s self- 
ratings are not related. 


5. The symbol rho (p) represents the rank order correlation. 


Set 23 


, V 2 _ (8 - . 5 ? ^ (8 - . 5)2 

^ 27~ —27 


4.16 . 


2. df -1. For P = .0500, = 3.84, The obtained chi square in Exer¬ 

cise 1 exceeds the value required at P = .0500. Therefore, the null 
hypothesis should be rejected. 


. v2 = (12.5)^ . (^ . (4.5)^ (17.5)^ 

22.5 22.5 22.5 22.5 


25.47 


4. df= 3. For P = .0500, X^ = 7.82. The obtained chi square in Exer¬ 
cise 3 does exceed the value required at P = .0500. Therefore, the 
null hypothesis should be rejected. 


. y 2 _ (7.7)2 (2.3)2 (g 3)2 


6.966 


6 . 


7. 


df= 2. For P = .0500, X^ = 5.99. The obtained chi square in Exer¬ 
cise 5 exceeds the value required at P = .0500. Therefore, the null 
hypothesis should be rejected. You conclude that the installation 
of an employee’s lounge is associated with increased productivity. 


(2^5)^ (20.5)^ 

150 150 


5.60 


368 



8. df= 1. For P = .0100, = 6.64. The obtained chi square in Exer¬ 

cise 7 does not exceed the value required at P = .01. Therefore, 
the null hypothesis should be accepted. 

9. The symbol X represents chi square. 


Set 24 

1. X^ = 9.67 

2. df -2. For P = .0500, X^ = 5.99. The obtained chi square in Exer- 
cise 1 exceeds the value required at P = .05. Therefore, the null 
hypothesis should be rejected. The data indicates that boys have 
a preference for basketball and do not prefer volleyball, whereas 
girls have a preference for volleyball and do not prefer basketball. 
There appears to be little difference in their preference of kickball. 

3. X^ = 4.076 

4. df = 12. For P = .0500, X^ = 21.03. The obtained chi square in 
Exercise 3 does not exceed the value required at P = .05. There¬ 
fore, the null hypothesis should be accepted. Your conclusion should 
be that there is no difference between types of television programs 
preferred by different nationality groups. 




GLOSSARY AND INDFX 



Glossary of Symbols and Abbreviations 


SYMBOL 

MEANING 

FRAME* 

A.D. 

Average Deviation 

6-24 

^xy 

Regression coefficient for variable X on 
variable Y 

20-31 

^yx 

Regression coefficient for variable Y on 
variable X 

20-15 

D 

Difference between two scores or ranks 4- 

-42, 21-23 

d 

Deviation of differences between a pair 
of scores from the mean difference 

21-32 

df 

Degrees of freedom 

10-24 

E 

Expected frequency (in chi square test) 

23-9 

F 

Analysis of variance test 

17-4 

f 

Frequency of scores 

1-15 

f w 

Frequencies of scores within a class interval 

2-21 

i 

Class interval 

1-35 

U 

Lower limits of a class interval 

1-46 

Mdn 

Median 

2-12 

MS 

Mean square (in analysis of variance) 

16-13 

N 

Number of raw scores or ranks 

1-20 

0 

Observed frequency (in chi square test) 

23-9 

P 

Probability 

8-2 

Q 

Semi-interquartile range 

5-8 

Qi 

Quartile designation (1st quartile) 

4-35 


**6—24’ means set 6, frame 24, Frame and set designations refer the student to the 
frame in which the discussion of the symbol begins, not necessarily the first one in 
which it appears. It may occasionally be necessary to read several frames in order to 
obtain an adequate definition of a symbol or abbreviation. 
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SYMBOL 

meaning 

frame 

r 

Pearson product-moment correlation 

19-3 


coefficient 

s 

Estimate of population standard deviation 

8-38 


Estimate of population variance 

10-10 


^diff 

Estimate of the standard error of the mean 

11-3 

Estimate of the standard error of the difference 
between means 

14-2 

SSt 

Sum of squares between groups (in analysis 
of variance) 

16-13 


Sum of squares within groups (in analysis 
of variance) 

16-13 

SSt 

Sum of squares of total groups (in analysis 
of variance) 

16-13 

t 

Sampling distribution of means of small 
samples 

' 12-21 



1-16 

X 

Raw score 

8-36 

X 

Sample mean score 

20-30 

X 

Estimated raw score (in regression equation) 

X 

Deviation of a raw score from the mean 

6-7 

6-17 

1 X 1 

Absolute deviation score 

20-14 

7-27 

Y 

Estimated raw score (in regression equation) 

z 

Standard normal deviate 
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Greek Symbols 


SYMBOL 

MEANING 

FRAME 

T 

The sum of 

2-18 

M 

Population mean score 

3-3 

CT 

Population standard deviation 

6-35 


Standard error of sample means 

9-16 


Population variance 

10-15 

^diff 

Standard error of the difference between 
sample means 

13-24 

P 

Rank order correlation 

22-19 


Chi square test of significance for 
frequency data 

23-11 
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Index' 


Absolute deviation score, 6--15 
Analysis of variance, 16-1 
Average deviation, 6-5 

Best fit line, 20-6 
Biased estimate of 
variance, 10-17 
Bimodal distribution, 5-36 

Cells, in chi square test, 23-23 
Central tendency, 2-1 
Chi square test, 23-10 
multiple samples, 24-1 
Class interval, 1-31 
Coefficient, Pearson product- 
moment correlations, 19-1 
rank order correlation, 22-19 
regression, 21-19 
Confidence interval 
five percent, 11-44 
one percent, 12-6 
Constants 
adding of, 4-1 
dividing by, 4-3 
multiplying by, 4-2 
subtracting of, 3-33 
Correlation 
negative, 18-29 
perfect, 18-24 
positive, 18-28 
rank order, 22-1 
zero, 18-45 


**6—24* means set 6, frame 24. 


Data, 1-2 

Degrees of freedom, 10-19 
pooled, 14-6 

Dependent variables, 21-2 
Deviation score, 6-7 

Expected frequencies in 
chi square test, 23-9 

F test 

analysis of variance, 17-4 
difference between two 
variance estimates, 17-26 
Frequency distribution, 1-13 
for grouped data, 1-29 
for ungrouped data, 1-27 
Frequency polygon, 5-17 
Frequency of scores, 1-12 

Hypothesis 
null, 13-5 
research, 15-6 

Independent samples, 14-22 
Inference, statistical, 11-15 
Interquartile range, '5-6 

Lower limit of class 
interval, 1-46 

Mean, 3-1 

Mean square, 16-15 


375 



Median, 2-9 
Mode, 2-1 

Normal curve, 6-29 
Normal distribution, 6-29 
Normal standard deviate, 7-27 
Null hypothesis, 13-5 

Observed frequencies, 

in chi square test, 23-8 
One-tailed test, 15-1 

Parameter, 8-43 
Pearson product-moment 

correlation coefficient, 19-1 
Percentiles, 4-23 
Population, 8-19 
Probability, 8-1 

Quartiles, 4-35 

Range, 5-2 
Rank order 
coefficient, 22-19 
correlation, 22-1 
Rank-ordered scores, 22-3 
Raw score, 1-3 

Real limits of a class interval, 1-44 
Regression coefficient, 20-19 
equation, 20-23 
line, 20-5 

Research hypothesis, 15-6 

Sample, 8-19 
random, 8-26 
representative, 8-25 
Sampling distribution of 
means, 9-11 
Sampling error, 9-1 
Scatter diagrams, 18-1 
Semi-interquartile range, 5-6 


Skewed distribution, 5-37 
negatively skewed, 5-38 
positively skewed, 5-40 
Smooth curve, 5-35 
Standard deviation 
calculation by deviation 
score method, 7-1 
calculation by raw score 
method, 7-14 
definition of, 6-34 
Standard error of the difference 
between means, 12-22 
Standard error of the mean, 9-15 
Statistic, 8-44 
Statistical inference, 11-15 
Sum of squares, 10-3 

t distribution, 12-21 
^-ratio, 14-21 

for dependent samples, 21-2 
direct difference method, 21-21 
for independent samples, 14-22 
Terminal statistics, 5-53 
Two-tailed test, 15-2 
Type I and Type II errors, 15-41 

Unbiased variance estimate, 10-18 
Unimodal distribution, 5-44 
Upper limits of a class 
interval, 1-50 

Variability, 5-1 
Variance, 10-9 
Variance 
biased, 10-17 
estimate, 10-14 
unbiased, 10-18 

Yates correction 

for continuity, 23-14 
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